Udacity cs101 Intro to CS Notebook

Course can be found hereclassroom
Done in 2017-05-01

LESSONS

How to manage data

Measure Udacity

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# Define a procedure, measure_udacity,
# that takes as its input a list of strings,
# and returns a number that is a count
# of the number of elements in the input
# list that start with the uppercase
# letter 'U'.
def measure_udacity(word):
count = 0
for i in word:
if i.find('U')>=0:
print 'i: ',i
count+=1
print 'count: ',count
return count
print measure_udacity(['Dave','Sebastian','Katy'])
#>>> 0
print measure_udacity(['Umika','Umberto'])
#>>> 2

Find Element

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# Define a procedure, find_element,
# that takes as its inputs a list
# and a value of any type, and
# returns the index of the first
# element in the input list that
# matches the value.
# If there is no matching element,
# return -1.
def find_element(U,u):
if u in U:
return U.index(u)
else:
return -1
print find_element([1,2,3],3)
#>>> 2
print find_element(['alpha','beta'],'gamma')
#>>> -1

Union

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Define a procedure, union,
# that takes as inputs two lists.
# It should modify the first input
# list to be the set union of the two
# lists. You may assume the first list
# is a set, that is, it contains no
# repeated elements.
def union(a,b):
for e in b:
if e not in a:
a.append(e)
# To test, uncomment all lines
# below except those beginning with >>>.
a = [1,2,3]
b = [2,4,6]
union(a,b)
print a
#>>> [1,2,3,4,6]
print b
#>>> [2,4,6]

Pop Quiz

31.1

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/Inr_DYUqxk8.mp4

Quiz: Print All Links
print_all_links must keep going until there are no more links to print. Think about looping forever (set while loop condition) until there are no more links (i.e. else:). What do you do when there are no more links (body of else: condition)?

At 1:26, Dave uses a procedure get_page(). The code for this procedure is given later in the course, in Lesson 4. This is the code:

def get_page(url): 
    try: import urllib 
        return urllib.urlopen(url).read() 
    except: return '' 

Include this code above your get_next_target() procedure in your answer.

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/BFYeJzcejxM.mp4


https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/xssLR71EuUw.mp4


https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/m3oEwba-yxU.mp4

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/f4L30M_25AI.mp4


https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/8krkKyimMUA.mp4

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/4-5169UHZTM.mp4


https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/E3_IlnR_j44.mp4

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/LUnnw_TBxPI.mp4


Finishing the Web Crawler

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/BWKxjRDadkI.mp4


Crawling Process

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/l_ulQLpFQJQ.mp4

Quiz: Crawling Process
Pseudo code from the video:

start with tocrawl = [seed]
crawled = []
while there are more pages tocrawl:
    pick a page from tocrawl
    add that page to crawled
    add all the link targets on this page to tocrawl
return crawled

The seed page where crawling begins.


https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/_DESjvmuSsA.mp4


Crawl Web

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/bI3rP7tAGdA.mp4

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/XRkqyIvx39w.mp4


Crawl Web Loop

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/h4pJFmz7l1g.mp4

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/jRm4rYw1w6c.mp4


Crawl If

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/lZhKW6QTmX0.mp4

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/sopj7b5XEfk.mp4


Finishing Crawl Web

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/nQl4F9uMvGU.mp4

Quiz: Finishing Crawl Web
Hint: at some point, you will have to call get_page on page. It seems counterintuitive, but we use the word page to refer to both the url and the html of a webpage. The get_page procedure takes a url and returns html.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
#Finish crawl web
def get_page(url):
# This is a simulated get_page procedure so that you can test your
# code on two pages "http://xkcd.com/353" and "http://xkcd.com/554".
# A procedure which actually grabs a page from the web will be
# introduced in unit 4.
try:
if url == "http://xkcd.com/353":
return '<?xml version="1.0" encoding="utf-8" ?><?xml-stylesheet href="http://imgs.xkcd.com/s/c40a9f8.css" type="text/css" media="screen" ?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"><html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>xkcd: Python</title> <link rel="stylesheet" type="text/css" href="http://imgs.xkcd.com/s/c40a9f8.css" media="screen" title="Default" /> <!--[if IE]><link rel="stylesheet" type="text/css" href="http://imgs.xkcd.com/s/ecbbecc.css" media="screen" title="Default" /><![endif]--> <link rel="alternate" type="application/atom+xml" title="Atom 1.0" href="/atom.xml" /> <link rel="alternate" type="application/rss+xml" title="RSS 2.0" href="/rss.xml" /> <link rel="icon" href="http://imgs.xkcd.com/s/919f273.ico" type="image/x-icon" /> <link rel="shortcut icon" href="http://imgs.xkcd.com/s/919f273.ico" type="image/x-icon" /> </head> <body> <div id="container"> <div id="topContainer"> <div id="topLeft" class="dialog"> <div class="hd"><div class="c"></div></div> <div class="bd"> <div class="c"> <div class="s">\t<ul> <li><a href="http://xkcd.com/554"">Archive</a><br /></li>\t <li><a href="http://blag.xkcd.com/">News/Blag</a><br /></li> <li><a href="http://store.xkcd.com/">Store</a><br /></li> <li><a href="/about/">About</a><br /></li> <li><a href="http://forums.xkcd.com/">Forums</a><br /></li> </ul> </div> </div> </div> <div class="ft"><div class="c"></div></div> </div> <div id="topRight" class="dialog"> <div class="hd"><div class="c"></div></div> <div class="bd"> <div class="c"> <div class="s"> <div id="topRightContainer"> <div id="logo"> <a href="/"><img src="http://imgs.xkcd.com/s/9be30a7.png" alt="xkcd.com logo" height="83" width="185"/></a> <h2><br />A webcomic of romance,<br/> sarcasm, math, and language.</h2> <div class="clearleft"></div> <br />XKCD updates every Monday, Wednesday, and Friday. </div> </div> </div> </div> </div> <div class="ft"><div class="c"></div></div> </div> </div> <div id="contentContainer"> <div id="middleContent" class="dialog"> <div class="hd"><div class="c"></div></div> <div class="bd"> <div class="c"> <div class="s"><h1>Python</h1><br/><br /><div class="menuCont"> <ul> <li><a href="/1/">|&lt;</a></li> <li><a href="/352/" accesskey="p">&lt; Prev</a></li> <li><a href="http://dynamic.xkcd.com/random/comic/" id="rnd_btn_t">Random</a></li> <li><a href="/354/" accesskey="n">Next &gt;</a></li> <li><a href="/">&gt;|</a></li> </ul></div><br/><br/><img src="http://imgs.xkcd.com/comics/python.png" title="I wrote 20 short programs in Python yesterday. It was wonderful. Perl, Im leaving you." alt="Python" /><br/><br/><div class="menuCont"> <ul> <li><a href="/1/">|&lt;</a></li> <li><a href="/352/" accesskey="p">&lt; Prev</a></li> <li><a href="http://dynamic.xkcd.com/random/comic/" id="rnd_btn_b">Random</a></li> <li><a href="/354/" accesskey="n">Next &gt;</a></li> <li><a href="/">&gt;|</a></li> </ul></div><h3>Permanent link to this comic: http://xkcd.com/353/</h3><h3>Image URL (for hotlinking/embedding): http://imgs.xkcd.com/comics/python.png</h3><div id="transcript" style="display: none">[[ Guy 1 is talking to Guy 2, who is floating in the sky ]]Guy 1: You39;re flying! How?Guy 2: Python!Guy 2: I learned it last night! Everything is so simple!Guy 2: Hello world is just 39;print &quot;Hello, World!&quot; 39;Guy 1: I dunno... Dynamic typing? Whitespace?Guy 2: Come join us! Programming is fun again! It39;s a whole new world up here!Guy 1: But how are you flying?Guy 2: I just typed 39;import antigravity39;Guy 1: That39;s it?Guy 2: ...I also sampled everything in the medicine cabinet for comparison.Guy 2: But i think this is the python.{{ I wrote 20 short programs in Python yesterday. It was wonderful. Perl, I39;m leaving you. }}</div> </div> </div> </div> <div class="ft"><div class="c"></div></div> </div> <div id="middleFooter" class="dialog"> <div class="hd"><div class="c"></div></div> <div class="bd"> <div class="c"> <div class="s"> <img src="http://imgs.xkcd.com/s/a899e84.jpg" width="520" height="100" alt="Selected Comics" usemap=" comicmap" /> <map name="comicmap"> <area shape="rect" coords="0,0,100,100" href="/150/" alt="Grownups" /> <area shape="rect" coords="104,0,204,100" href="/730/" alt="Circuit Diagram" /> <area shape="rect" coords="208,0,308,100" href="/162/" alt="Angular Momentum" /> <area shape="rect" coords="312,0,412,100" href="/688/" alt="Self-Description" /> <area shape="rect" coords="416,0,520,100" href="/556/" alt="Alternative Energy Revolution" /> </map><br/><br />Search comic titles and transcripts:<br /><script type="text/javascript" src="//www.google.com/jsapi"></script><script type="text/javascript"> google.load(\"search\", \"1\"); google.setOnLoadCallback(function() { google.search.CustomSearchControl.attachAutoCompletion( \"012652707207066138651:zudjtuwe28q\", document.getElementById(\"q\"), \"cse-search-box\"); });</script><form action="//www.google.com/cse" id="cse-search-box"> <div> <input type="hidden" name="cx" value="012652707207066138651:zudjtuwe28q" /> <input type="hidden" name="ie" value="UTF-8" /> <input type="text" name="q" id="q" autocomplete="off" size="31" /> <input type="submit" name="sa" value="Search" /> </div></form><script type="text/javascript" src="//www.google.com/cse/brand?form=cse-search-box&lang=en"></script><a href="/rss.xml">RSS Feed</a> - <a href="/atom.xml">Atom Feed</a><br /> <br/> <div id="comicLinks"> Comics I enjoy:<br/> <a href="http://www.qwantz.com">Dinosaur Comics</a>, <a href="http://www.asofterworld.com">A Softer World</a>, <a href="http://pbfcomics.com/">Perry Bible Fellowship</a>, <a href="http://www.boltcity.com/copper/">Copper</a>, <a href="http://questionablecontent.net/">Questionable Content</a>, <a href="http://achewood.com/">Achewood</a>, <a href="http://wondermark.com/">Wondermark</a>, <a href="http://thisisindexed.com/">Indexed</a>, <a href="http://www.buttercupfestival.com/buttercupfestival.htm">Buttercup Festival</a> </div> <br/> Warning: this comic occasionally contains strong language (which may be unsuitable for children), unusual humor (which may be unsuitable for adults), and advanced mathematics (which may be unsuitable for liberal-arts majors).<br/> <br/> <h4>We did not invent the algorithm. The algorithm consistently finds Jesus. The algorithm killed Jeeves. <br />The algorithm is banned in China. The algorithm is from Jersey. The algorithm constantly finds Jesus.<br />This is not the algorithm. This is close.</h4><br/> <div class="line"></div> <br/> <div id="licenseText"> <!-- <a rel="license" href="http://creativecommons.org/licenses/by-nc/2.5/"><img alt="Creative Commons License" style="border:none" src="http://imgs.xkcd.com/static/somerights20.png" /></a><br/> --> This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc/2.5/">Creative Commons Attribution-NonCommercial 2.5 License</a>.<!-- <rdf:RDF xmlns="http://web.resource.org/cc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns "><Work rdf:about=""><dc:creator>Randall Munroe</dc:creator><dcterms:rightsHolder>Randall Munroe</dcterms:rightsHolder><dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage" /><dc:source rdf:resource="http://www.xkcd.com/"/><license rdf:resource="http://creativecommons.org/licenses/by-nc/2.5/" /></Work><License rdf:about="http://creativecommons.org/licenses/by-nc/2.5/"><permits rdf:resource="http://web.resource.org/cc/Reproduction" /><permits rdf:resource="http://web.resource.org/cc/Distribution" /><requires rdf:resource="http://web.resource.org/cc/Notice" /><requires rdf:resource="http://web.resource.org/cc/Attribution" /><prohibits rdf:resource="http://web.resource.org/cc/CommercialUse" /><permits rdf:resource="http://web.resource.org/cc/DerivativeWorks" /></License></rdf:RDF> --> <br/> This means you\"re free to copy and share these comics (but not to sell them). <a href="/license.html">More details</a>.<br/> </div> </div> </div> </div> <div class="ft"><div class="c"></div></div> </div> </div> </div> </body></html> '
elif url == "http://xkcd.com/554":
return '<?xml version="1.0" encoding="utf-8" ?> <?xml-stylesheet href="http://imgs.xkcd.com/s/c40a9f8.css" type="text/css" media="screen" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>xkcd: Not Enough Work</title> <link rel="stylesheet" type="text/css" href="http://imgs.xkcd.com/s/c40a9f8.css" media="screen" title="Default" /> <!--[if IE]><link rel="stylesheet" type="text/css" href="http://imgs.xkcd.com/s/ecbbecc.css" media="screen" title="Default" /><![endif]--> <link rel="alternate" type="application/atom+xml" title="Atom 1.0" href="/atom.xml" /> <link rel="alternate" type="application/rss+xml" title="RSS 2.0" href="/rss.xml" /> <link rel="icon" href="http://imgs.xkcd.com/s/919f273.ico" type="image/x-icon" /> <link rel="shortcut icon" href="http://imgs.xkcd.com/s/919f273.ico" type="image/x-icon" /> </head> <body> <div id="container"> <div id="topContainer"> <div id="topLeft" class="dialog"> <div class="hd"><div class="c"></div></div> <div class="bd"> <div class="c"> <div class="s"> <ul> <li><a href="/archive/">Archive</a><br /></li> <li><a href="http://blag.xkcd.com/">News/Blag</a><br /></li> <li><a href="http://store.xkcd.com/">Store</a><br /></li> <li><a href="/about/">About</a><br /></li> <li><a href="http://forums.xkcd.com/">Forums</a><br /></li> </ul> </div> </div> </div> <div class="ft"><div class="c"></div></div> </div> <div id="topRight" class="dialog"> <div class="hd"><div class="c"></div></div> <div class="bd"> <div class="c"> <div class="s"> <div id="topRightContainer"> <div id="logo"> <a href="/"><img src="http://imgs.xkcd.com/s/9be30a7.png" alt="xkcd.com logo" height="83" width="185"/></a> <h2><br />A webcomic of romance,<br/> sarcasm, math, and language.</h2> <div class="clearleft"></div> XKCD updates every Monday, Wednesday, and Friday. <br /> Blag: Remember geohashing? <a href="http://blog.xkcd.com/2012/02/27/geohashing-2/">Something pretty cool</a> happened Sunday. </div> </div> </div> </div> </div> <div class="ft"><div class="c"></div></div> </div> </div> <div id="contentContainer"> <div id="middleContent" class="dialog"> <div class="hd"><div class="c"></div></div> <div class="bd"> <div class="c"> <div class="s"> <h1>Not Enough Work</h1><br/> <br /> <div class="menuCont"> <ul> <li><a href="/1/">|&lt;</a></li> <li><a href="/553/" accesskey="p">&lt; Prev</a></li> <li><a href="http://dynamic.xkcd.com/random/comic/" id="rnd_btn_t">Random</a></li> <li><a href="/555/" accesskey="n">Next &gt;</a></li> <li><a href="/">&gt;|</a></li> </ul> </div> <br/> <br/> <img src="http://imgs.xkcd.com/comics/not_enough_work.png" title="It39;s even harder if you39;re an asshole who pronounces &lt;&gt; brackets." alt="Not Enough Work" /><br/> <br/> <div class="menuCont"> <ul> <li><a href="/1/">|&lt;</a></li> <li><a href="/553/" accesskey="p">&lt; Prev</a></li> <li><a href="http://dynamic.xkcd.com/random/comic/" id="rnd_btn_b">Random</a></li> <li><a href="/555/" accesskey="n">Next &gt;</a></li> <li><a href="/">&gt;|</a></li> </ul> </div> <h3>Permanent link to this comic: http://xkcd.com/554/</h3> <h3>Image URL (for hotlinking/embedding): http://imgs.xkcd.com/comics/not_enough_work.png</h3> <div id="transcript" style="display: none">Narration: Signs your coders don39;t have enough work to do: [[A man sitting at his workstation; a female co-worker behind him]] Man: I39;m almost up to my old typing speed in dvorak [[Two men standing by a server rack]] Man 1: Our servers now support gopher. Man 1: Just in case. [[A woman standing near her workstation speaking to a male co-worker]] Woman: Our pages are now HTML, XHTML-STRICT, and haiku-compliant Man: Haiku? Woman: &lt;div class=&quot;main&quot;&gt; Woman: &lt;span id=&quot;marquee&quot;&gt; Woman: Blog!&lt; span&gt;&lt; div&gt; [[A woman sitting at her workstation]] Woman: Hey! Have you guys seen this webcomic? {{title text: It39;s even harder if you39;re an asshole who pronounces &lt;&gt; brackets.}}</div> </div> </div> </div> <div class="ft"><div class="c"></div></div> </div> <div id="middleFooter" class="dialog"> <div class="hd"><div class="c"></div></div> <div class="bd"> <div class="c"> <div class="s"> <img src="http://imgs.xkcd.com/s/a899e84.jpg" width="520" height="100" alt="Selected Comics" usemap=" comicmap" /> <map name="comicmap"> <area shape="rect" coords="0,0,100,100" href="/150/" alt="Grownups" /> <area shape="rect" coords="104,0,204,100" href="/730/" alt="Circuit Diagram" /> <area shape="rect" coords="208,0,308,100" href="/162/" alt="Angular Momentum" /> <area shape="rect" coords="312,0,412,100" href="/688/" alt="Self-Description" /> <area shape="rect" coords="416,0,520,100" href="/556/" alt="Alternative Energy Revolution" /> </map><br/><br /> Search comic titles and transcripts:<br /> <script type="text/javascript" src="//www.google.com/jsapi"></script> <script type="text/javascript"> google.load("search", "1"); google.search.CustomSearchControl.attachAutoCompletion( "012652707207066138651:zudjtuwe28q", document.getElementById("q"), "cse-search-box"); }); </script> <form action="//www.google.com/cse" id="cse-search-box"> <div> <input type="hidden" name="cx" value="012652707207066138651:zudjtuwe28q" /> <input type="hidden" name="ie" value="UTF-8" /> <input type="text" name="q" id="q" autocomplete="off" size="31" /> <input type="submit" name="sa" value="Search" /> </div> </form> <script type="text/javascript" src="//www.google.com/cse/brand?form=cse-search-box&lang=en"></script> <a href="/rss.xml">RSS Feed</a> - <a href="/atom.xml">Atom Feed</a> <br /> <br/> <div id="comicLinks"> Comics I enjoy:<br/> <a href="http://threewordphrase.com/">Three Word Phrase</a>, <a href="http://oglaf.com/">Oglaf</a> (nsfw), <a href="http://www.smbc-comics.com/">SMBC</a>, <a href="http://www.qwantz.com">Dinosaur Comics</a>, <a href="http://www.asofterworld.com">A Softer World</a>, <a href="http://buttersafe.com/">Buttersafe</a>, <a href="http://pbfcomics.com/">Perry Bible Fellowship</a>, <a href="http://questionablecontent.net/">Questionable Content</a>, <a href="http://www.buttercupfestival.com/buttercupfestival.htm">Buttercup Festival</a> </div> <br/> Warning: this comic occasionally contains strong language (which may be unsuitable for children), unusual humor (which may be unsuitable for adults), and advanced mathematics (which may be unsuitable for liberal-arts majors).<br/> <br/> <h4>We did not invent the algorithm. The algorithm consistently finds Jesus. The algorithm killed Jeeves. <br />The algorithm is banned in China. The algorithm is from Jersey. The algorithm constantly finds Jesus.<br />This is not the algorithm. This is close.</h4><br/> <div class="line"></div> <br/> <div id="licenseText"> <!-- <a rel="license" href="http://creativecommons.org/licenses/by-nc/2.5/"><img alt="Creative Commons License" style="border:none" src="http://imgs.xkcd.com/static/somerights20.png" /></a><br/> --> This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc/2.5/">Creative Commons Attribution-NonCommercial 2.5 License</a>. <!-- <rdf:RDF xmlns="http://web.resource.org/cc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns "><Work rdf:about=""><dc:creator>Randall Munroe</dc:creator><dcterms:rightsHolder>Randall Munroe</dcterms:rightsHolder><dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage" /><dc:source rdf:resource="http://www.xkcd.com/"/><license rdf:resource="http://creativecommons.org/licenses/by-nc/2.5/" /></Work><License rdf:about="http://creativecommons.org/licenses/by-nc/2.5/"><permits rdf:resource="http://web.resource.org/cc/Reproduction" /><permits rdf:resource="http://web.resource.org/cc/Distribution" /><requires rdf:resource="http://web.resource.org/cc/Notice" /><requires rdf:resource="http://web.resource.org/cc/Attribution" /><prohibits rdf:resource="http://web.resource.org/cc/CommercialUse" /><permits rdf:resource="http://web.resource.org/cc/DerivativeWorks" /></License></rdf:RDF> --> <br/> This means you"re free to copy and share these comics (but not to sell them). <a href="/license.html">More details</a>.<br/> </div> </div> </div> </div> <div class="ft"><div class="c"></div></div> </div> </div> </div> </body> </html> '
except:
return ""
return ""
def get_next_target(page):
start_link = page.find('<a href=')
if start_link == -1:
return None, 0
start_quote = page.find('"', start_link)
end_quote = page.find('"', start_quote + 1)
url = page[start_quote + 1:end_quote]
return url, end_quote
def union(p,q):
for e in q:
if e not in p:
p.append(e)
def get_all_links(page):
links = []
while True:
url,endpos = get_next_target(page)
if url:
links.append(url)
page = page[endpos:]
else:
break
return links
def crawl_web(seed):
tocrawl = [seed]
crawled = []
while tocrawl:
page = tocrawl.pop()
if page not in crawled:
union(tocrawl,get_all_links(get_page(page)))
crawled.append(page)
return crawled

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/sII5zYOFywM.mp4


Conclusion

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/Qm4wJi2Me6Y.mp4


Problem Set

Lists


https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/fzaaNzGDcCg.mp4


Mutating Lists


https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/kFEMVfPAP-A.mp4

Product List

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Define a procedure, product_list,
# that takes as input a list of numbers,
# and returns a number that is
# the result of multiplying all
# those numbers together.
def product_list(list_of_numbers):
tem = 1
for i in list_of_numbers:
tem = tem * i
return tem
print product_list([9])
#>>> 9
print product_list([1,2,3,4])
#>>> 24
print product_list([])
#>>> 1

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/RTPL87SBv6o.mp4


Greatest

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# Define a procedure, greatest,
# that takes as input a list
# of positive numbers, and
# returns the greatest number
# in that list. If the input
# list is empty, the output
# should be 0.
def greatest(list_of_numbers):
tem = 0
for i in list_of_numbers:
if i>tem:
tem = i
return tem
print greatest([4,23,1])
#>>> 23
print greatest([])
#>>> 0

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/okAtgJROqgs.mp4

Lists of Lists

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# Define a procedure, total_enrollment,
# that takes as an input a list of elements,
# where each element is a list containing
# three elements: a university name,
# the total number of students enrolled,
# and the annual tuition fees.
# The procedure should return two numbers,
# not a string,
# giving the total number of students
# enrolled at all of the universities
# in the list, and the total tuition fees
# (which is the sum of the number
# of students enrolled times the
# tuition fees for each university).
udacious_univs = [['Udacity',90000,0]]
usa_univs = [ ['California Institute of Technology',2175,37704],
['Harvard',19627,39849],
['Massachusetts Institute of Technology',10566,40732],
['Princeton',7802,37000],
['Rice',5879,35551],
['Stanford',19535,40569],
['Yale',11701,40500] ]
def total_enrollment(p):
total_stu = 0
total_fee = 0
for a,b,c in p:
total_stu = total_stu + b
total_fee = total_fee + b* c
return total_stu, total_fee
#print total_enrollment(udacious_univs)
#>>> (90000,0)
# The L is automatically added by Python to indicate a long
# number. If you are trying the question in an outside
# interpreter you might not see it.
#print total_enrollment(usa_univs)
#>>> (77285,3058581079)

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/xk4fB0yfq58.mp4

Max Pages

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
# The web crawler we built at the end of Unit 3 has some serious
# flaws if we were going to use it in a real crawler. One
# problem is if we start with a good seed page, it might
# run for an extremely long time (even forever, since the
# number of URLS on the web is not actually finite). This
# question and the following one explore two different ways
# to limit the pages that it can crawl.
# Modify the crawl_web procedure to take a second parameter,
# max_pages, that limits the number of pages to crawl.
# Your procedure should terminate the crawl after
# max_pages different pages have been crawled, or when
# there are no more pages to crawl.
# The following definition of get_page provides an interface
# to the website found at http://www.udacity.com/cs101x/index.html
# The function output order does not affect grading.
def get_page(url):
try:
if url == "http://www.udacity.com/cs101x/index.html":
return ('<html> <body> This is a test page for learning to crawl! '
'<p> It is a good idea to '
'<a href="http://www.udacity.com/cs101x/crawling.html">learn to '
'crawl</a> before you try to '
'<a href="http://www.udacity.com/cs101x/walking.html">walk</a> '
'or <a href="http://www.udacity.com/cs101x/flying.html">fly</a>. '
'</p> </body> </html> ')
elif url == "http://www.udacity.com/cs101x/crawling.html":
return ('<html> <body> I have not learned to crawl yet, but I '
'am quite good at '
'<a href="http://www.udacity.com/cs101x/kicking.html">kicking</a>.'
'</body> </html>')
elif url == "http://www.udacity.com/cs101x/walking.html":
return ('<html> <body> I cant get enough '
'<a href="http://www.udacity.com/cs101x/index.html">crawling</a>! '
'</body> </html>')
elif url == "http://www.udacity.com/cs101x/flying.html":
return ('<html> <body> The magic words are Squeamish Ossifrage! '
'</body> </html>')
except:
return ""
return ""
def get_next_target(page):
start_link = page.find('<a href=')
if start_link == -1:
return None, 0
start_quote = page.find('"', start_link)
end_quote = page.find('"', start_quote + 1)
url = page[start_quote + 1:end_quote]
return url, end_quote
def union(p,q):
for e in q:
if e not in p:
p.append(e)
def get_all_links(page):
links = []
while True:
url,endpos = get_next_target(page)
if url:
links.append(url)
page = page[endpos:]
else:
break
return links
def crawl_web(seed, max_pages):
tocrawl = [seed]
crawled = []
while tocrawl:
page = tocrawl.pop()
if page not in crawled and len(crawled) < max_pages :
union(tocrawl, get_all_links(get_page(page)))
crawled.append(page)
return crawled
print crawl_web("http://www.udacity.com/cs101x/index.html",1)
#>>> ['http://www.udacity.com/cs101x/index.html']
print crawl_web("http://www.udacity.com/cs101x/index.html",3)
#>>> ['http://www.udacity.com/cs101x/index.html',
#>>> 'http://www.udacity.com/cs101x/flying.html',
#>>> 'http://www.udacity.com/cs101x/walking.html']
print crawl_web("http://www.udacity.com/cs101x/index.html",500)
#>>> ['http://www.udacity.com/cs101x/index.html',
#>>> 'http://www.udacity.com/cs101x/flying.html',
#>>> 'http://www.udacity.com/cs101x/walking.html',
#>>> 'http://www.udacity.com/cs101x/crawling.html',
#>>> 'http://www.udacity.com/cs101x/kicking.html']

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/Mh0Rw9fV9UU.mp4

Max Depth

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
#
# This question explores a different way (from the previous question)
# to limit the pages that it can crawl.
#
#######
# THREE GOLD STARS #
# Yes, we really mean it! This is really tough (but doable) unless
# you have some previous experience before this course.
# Modify the crawl_web procedure to take a second parameter,
# max_depth, that limits the depth of the search. We can
# define the depth of a page as the number of links that must
# be followed to reach that page starting from the seed page,
# that is, the length of the shortest path from the seed to
# the page. No pages whose depth exceeds max_depth should be
# included in the crawl.
#
# For example, if max_depth is 0, the only page that should
# be crawled is the seed page. If max_depth is 1, the pages
# that should be crawled are the seed page and every page that
# it links to directly. If max_depth is 2, the crawl should
# also include all pages that are linked to by these pages.
#
# Note that the pages in the crawl may be in any order.
#
# The following definition of get_page provides an interface
# to the website found at http://www.udacity.com/cs101x/index.html
# The function output order does not affect grading.
def get_page(url):
try:
if url == "http://www.udacity.com/cs101x/index.html":
return ('<html> <body> This is a test page for learning to crawl! '
'<p> It is a good idea to '
'<a href="http://www.udacity.com/cs101x/crawling.html">learn to '
'crawl</a> before you try to '
'<a href="http://www.udacity.com/cs101x/walking.html">walk</a> '
'or <a href="http://www.udacity.com/cs101x/flying.html">fly</a>. '
'</p> </body> </html> ')
elif url == "http://www.udacity.com/cs101x/crawling.html":
return ('<html> <body> I have not learned to crawl yet, but I '
'am quite good at '
'<a href="http://www.udacity.com/cs101x/kicking.html">kicking</a>.'
'</body> </html>')
elif url == "http://www.udacity.com/cs101x/walking.html":
return ('<html> <body> I cant get enough '
'<a href="http://www.udacity.com/cs101x/index.html">crawling</a>! '
'</body> </html>')
elif url == "http://www.udacity.com/cs101x/flying.html":
return ('<html> <body> The magic words are Squeamish Ossifrage! '
'</body> </html>')
elif url == "http://top.contributors/velak.html":
return ('<a href="http://top.contributors/jesyspa.html">'
'<a href="http://top.contributors/forbiddenvoid.html">')
elif url == "http://top.contributors/jesyspa.html":
return ('<a href="http://top.contributors/elssar.html">'
'<a href="http://top.contributors/kilaws.html">')
elif url == "http://top.contributors/forbiddenvoid.html":
return ('<a href="http://top.contributors/charlzz.html">'
'<a href="http://top.contributors/johang.html">'
'<a href="http://top.contributors/graemeblake.html">')
elif url == "http://top.contributors/kilaws.html":
return ('<a href="http://top.contributors/tomvandenbosch.html">'
'<a href="http://top.contributors/mathprof.html">')
elif url == "http://top.contributors/graemeblake.html":
return ('<a href="http://top.contributors/dreyescat.html">'
'<a href="http://top.contributors/angel.html">')
elif url == "A1":
return '<a href="B1"> <a href="C1"> '
elif url == "B1":
return '<a href="E1">'
elif url == "C1":
return '<a href="D1">'
elif url == "D1":
return '<a href="E1"> '
elif url == "E1":
return '<a href="F1"> '
except:
return ""
return ""
def get_next_target(page):
start_link = page.find('<a href=')
if start_link == -1:
return None, 0
start_quote = page.find('"', start_link)
end_quote = page.find('"', start_quote + 1)
url = page[start_quote + 1:end_quote]
return url, end_quote
def union(p,q):
for e in q:
if e not in p:
p.append(e)
def get_all_links(page):
links = []
while True:
url,endpos = get_next_target(page)
if url:
links.append(url)
page = page[endpos:]
else:
break
return links
def crawl_web(seed,max_depth):
tocrawl = [seed]
crawled = []
count = 0
while tocrawl:
page = tocrawl.pop()
if page not in crawled and count < max_depth:
union(tocrawl, get_all_links(get_page(page)))
crawled.append(page)
count +=1
return crawled
print crawl_web("http://www.udacity.com/cs101x/index.html",0)
#>>> ['http://www.udacity.com/cs101x/index.html']
print crawl_web("http://www.udacity.com/cs101x/index.html",1)
#>>> ['http://www.udacity.com/cs101x/index.html',
#>>> 'http://www.udacity.com/cs101x/flying.html',
#>>> 'http://www.udacity.com/cs101x/walking.html',
#>>> 'http://www.udacity.com/cs101x/crawling.html']
print crawl_web("http://www.udacity.com/cs101x/index.html",50)
#>>> ['http://www.udacity.com/cs101x/index.html',
#>>> 'http://www.udacity.com/cs101x/flying.html',
#>>> 'http://www.udacity.com/cs101x/walking.html',
#>>> 'http://www.udacity.com/cs101x/crawling.html',
#>>> 'http://www.udacity.com/cs101x/kicking.html']
print crawl_web("http://top.contributors/forbiddenvoid.html",2)
#>>> ['http://top.contributors/forbiddenvoid.html',
#>>> 'http://top.contributors/graemeblake.html',
#>>> 'http://top.contributors/angel.html',
#>>> 'http://top.contributors/dreyescat.html',
#>>> 'http://top.contributors/johang.html',
#>>> 'http://top.contributors/charlzz.html']
print crawl_web("A1",3)
#>>> ['A1', 'C1', 'B1', 'E1', 'D1', 'F1']
# (May be in any order)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
def crawl_web(seed,max_depth):
tocrawl = [seed]
crawled = []
next_depth = []
depth = 0
while tocrawl and depth <= max_depth:
page = tocrawl.pop()
if page not in crawled:
union(next_depth, get_all_links(get_page(page)))
crawled.append(page)
if not tocrawl:
tocrawl, next_depth = next_depth, []
depth = depth + 1
return crawled

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/TRNyIIrB73Q.mp4

Sudoku

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
# THREE GOLD STARS
# Sudoku [http://en.wikipedia.org/wiki/Sudoku]
# is a logic puzzle where a game
# is defined by a partially filled
# 9 x 9 square of digits where each square
# contains one of the digits 1,2,3,4,5,6,7,8,9.
# For this question we will generalize
# and simplify the game.
# Define a procedure, check_sudoku,
# that takes as input a square list
# of lists representing an n x n
# sudoku puzzle solution and returns the boolean
# True if the input is a valid
# sudoku square and returns the boolean False
# otherwise.
# A valid sudoku square satisfies these
# two properties:
# 1. Each column of the square contains
# each of the whole numbers from 1 to n exactly once.
# 2. Each row of the square contains each
# of the whole numbers from 1 to n exactly once.
# You may assume the the input is square and contains at
# least one row and column.
correct = [[1,2,3],
[2,3,1],
[3,1,2]]
incorrect = [[1,2,3,4],
[2,3,1,3],
[3,1,2,3],
[4,4,4,4]]
incorrect2 = [[1,2,3,4],
[2,3,1,4],
[4,1,2,3],
[3,4,1,2]]
incorrect3 = [[1,2,3,4,5],
[2,3,1,5,6],
[4,5,2,1,3],
[3,4,5,2,1],
[5,6,4,3,2]]
incorrect4 = [['a','b','c'],
['b','c','a'],
['c','a','b']]
incorrect5 = [ [1, 1.5],
[1.5, 1]]
def check_sudoku():
#print check_sudoku(incorrect)
#>>> False
#print check_sudoku(correct)
#>>> True
#print check_sudoku(incorrect2)
#>>> False
#print check_sudoku(incorrect3)
#>>> False
#print check_sudoku(incorrect4)
#>>> False
#print check_sudoku(incorrect5)
#>>> False

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/60wESSZRSp0.mp4

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
def check_sudoku(p):
n = len(p)
digit = 1
while digit <=n:
i = 0
while i < n:
row_count = 0
col_count = 0
j = 0
while j < n:
if p[i][j] == digit:
row_count = row_count + 1
if p[j][i] == digit:
col_count = col_count + 1
j = j + 1
if row_count != 1 or col_count != 1:
return False
i = i+1
digit = digit + 1
return True

Problem Set(Optional)

Exploring List Properties

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Investigating adding and appending to lists
# If you run the following four lines of codes, what are list1 and list2?
list1 = [1,2,3,4]
list2 = [1,2,3,4]
list1 = list1 + [5, 6]
list2.append([5, 6])
# to check, you can print them out using the print statements below.
print "showing list1 and list2:"
print list1
print list2
showing list1 and list2:
[1, 2, 3, 4, 5, 6]
[1, 2, 3, 4, [5, 6]]

Symmetric Square

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# A list is symmetric if the first row is the same as the first column,
# the second row is the same as the second column and so on. Write a
# procedure, symmetric, which takes a list as input, and returns the
# boolean True if the list is symmetric and False if it is not.
def symmetric(p):
# Your code here
n = len(p)
if n == 0:
return True
if len(p[0]) != n:
return False
i = 0
while i < n:
j = 0
while j < n:
if p[i][j] != p[j][i]:
return False
j = j +1
i = i +1
return True
print symmetric([[1, 2, 3],
[2, 3, 4],
[3, 4, 1]])
#>>> True
print symmetric([["cat", "dog", "fish"],
["dog", "dog", "fish"],
["fish", "fish", "cat"]])
#>>> True
print symmetric([["cat", "dog", "fish"],
["dog", "dog", "dog"],
["fish","fish","cat"]])
#>>> False
print symmetric([[1, 2],
[2, 1]])
#>>> True
print symmetric([[1, 2, 3, 4],
[2, 3, 4, 5],
[3, 4, 5, 6]])
#>>> False
print symmetric([[1,2,3],
[2,3,1]])
#>>> False

Mean of a List

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# The mean of a set of numbers is the sum of the numbers divided by the
# number of numbers. Write a procedure, list_mean, which takes a list of numbers
# as its input and return the mean of the numbers in the list.
# Hint: You will need to work out how to make your division into decimal
# division instead of integer division. You get decimal division if any of
# the numbers involved are decimals.
def list_mean(p):
n = len(p)
if n ==0:
return -1
sum = 0
for i in p:
sum = sum+i
return sum*1.0 / n
print list_mean([1,2,3,4])
#>>> 2.5
print list_mean([1,3,4,5,2])
#>>> 3.0
print list_mean([])
#>>> ??? You decide. If you decide it should give an error, comment
# out the print line above to prevent your code from being graded as
# incorrect.
print list_mean([2])
#>>> 2.0

Notes on lists

Problem Set(Optional 2)

Antisymmetric Square

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# By Dimitris_GR from forums
# Modify Problem Set 31's (Optional) Symmetric Square to return True
# if the given square is antisymmetric and False otherwise.
# An nxn square is called antisymmetric if A[i][j]=-A[j][i]
# for each i=0,1,...,n-1 and for each j=0,1,...,n-1.
def antisymmetric(A):
#Write your code here
# Test Cases:
print antisymmetric([[0, 1, 2],
[-1, 0, 3],
[-2, -3, 0]])
#>>> True
print antisymmetric([[0, 0, 0],
[0, 0, 0],
[0, 0, 0]])
#>>> True
print antisymmetric([[0, 1, 2],
[-1, 0, -2],
[2, 2, 3]])
#>>> False
print antisymmetric([[1, 2, 5],
[0, 1, -9],
[0, 0, 1]])
#>>> False

Recognize Identity Matrix

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
# By Ashwath from forums
# Given a list of lists representing a n * n matrix as input,
# define a procedure that returns True if the input is an identity matrix
# and False otherwise.
# An IDENTITY matrix is a square matrix in which all the elements
# on the principal/main diagonal are 1 and all the elements outside
# the principal diagonal are 0.
# (A square matrix is a matrix in which the number of rows
# is equal to the number of columns)
def is_identity_matrix(matrix):
#Write your code here
# Test Cases:
matrix1 = [[1,0,0,0],
[0,1,0,0],
[0,0,1,0],
[0,0,0,1]]
print is_identity_matrix(matrix1)
#>>>True
matrix2 = [[1,0,0],
[0,1,0],
[0,0,0]]
print is_identity_matrix(matrix2)
#>>>False
matrix3 = [[2,0,0],
[0,2,0],
[0,0,2]]
print is_identity_matrix(matrix3)
#>>>False
matrix4 = [[1,0,0,0],
[0,1,1,0],
[0,0,0,1]]
print is_identity_matrix(matrix4)
#>>>False
matrix5 = [[1,0,0,0,0,0,0,0,0]]
print is_identity_matrix(matrix5)
#>>>False
matrix6 = [[1,0,0,0],
[0,1,0,1],
[0,0,1,0],
[0,0,0,1]]
print is_identity_matrix(matrix6)
#>>>False
matrix7 = [[1, -1, 1],
[0, 1, 0],
[0, 0, 1]]
print is_identity_matrix(matrix7)
#>>>False

Numbers in Lists

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# Numbers in lists by SeanMc from forums
# define a procedure that takes in a string of numbers from 1-9 and
# outputs a list with the following parameters:
# Every number in the string should be inserted into the list.
# If a number x in the string is less than or equal
# to the preceding number y, the number x should be inserted
# into a sublist. Continue adding the following numbers to the
# sublist until reaching a number z that
# is greater than the number y.
# Then add this number z to the normal list and continue.
#Hint - "int()" turns a string's element into a number
def numbers_in_lists(string):
# YOUR CODE
#testcases
string = '543987'
result = [5,[4,3],9,[8,7]]
print repr(string), numbers_in_lists(string) == result
string= '987654321'
result = [9,[8,7,6,5,4,3,2,1]]
print repr(string), numbers_in_lists(string) == result
string = '455532123266'
result = [4, 5, [5, 5, 3, 2, 1, 2, 3, 2], 6, [6]]
print repr(string), numbers_in_lists(string) == result
string = '123456789'
result = [1, 2, 3, 4, 5, 6, 7, 8, 9]
print repr(string), numbers_in_lists(string) == result

Frequency Analysis

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# Crypto Analysis: Frequency Analysis
#
# To analyze encrypted messages, to find out information about the possible
# algorithm or even language of the clear text message, one could perform
# frequency analysis. This process could be described as simply counting
# the number of times a certain symbol occurs in the given text.
# For example:
# For the text "test" the frequency of 'e' is 1, 's' is 1 and 't' is 2.
#
# The input to the function will be an encrypted body of text that only contains
# the lowercase letters a-z.
# As output you should return a list of the normalized frequency
# for each of the letters a-z.
# The normalized frequency is simply the number of occurrences, i,
# divided by the total number of characters in the message, n.
def freq_analysis(message):
##
# Your code here
##
return freq_list
#Tests
print freq_analysis("abcd")
#>>> [0.25, 0.25, 0.25, 0.25, 0.0, ..., 0.0]
print freq_analysis("adca")
#>>> [0.5, 0.0, 0.25, 0.25, 0.0, ..., 0.0]
print freq_analysis('bewarethebunnies')
#>>> [0.0625, 0.125, 0.0, 0.0, ..., 0.0]

Responding to Queries

Introduction

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/gXkELecZYlk.mp4

Welcome to Unit 4!

The notes for Unit 4 are here: PDF and web.

By the end of this unit, we’ll have a working search engine that can crawl and build an index of set of web pages, and respond to keyword queries!

You’ll also learn about designing and using complex data structures that build on the list structure we introduced in the previous unit.

Data Structures

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/pv5-RgG1pdk.mp4

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/nNEXCEH0dEw.mp4

Add to Index

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/B2J-bDQ4M1o.mp4

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# Define a procedure, add_to_index,
# that takes 3 inputs:
# - an index: [[<keyword>,[<url>,...]],...]
# - a keyword: String
# - a url: String
# If the keyword is already
# in the index, add the url
# to the list of urls associated
# with that keyword.
# If the keyword is not in the index,
# add an entry to the index: [keyword,[url]]
index = []
def add_to_index(index,keyword,url):
#add_to_index(index,'udacity','http://udacity.com')
#add_to_index(index,'computing','http://acm.org')
#add_to_index(index,'udacity','http://npr.org')
#print index
#>>> [['udacity', ['http://udacity.com', 'http://npr.org']],
#>>> ['computing', ['http://acm.org']]]

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/SGkb6vqS7zA.mp4

1
2
3
4
5
6
def add_to_index(index,keyword,url):
for entry in index:
if entry[0]== keyword:
entry[1].append(url)
return
index.append([keyword,[url]])

Lookup

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/hzDJhLS4yCo.mp4

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# Define a procedure, lookup,
# that takes two inputs:
# - an index
# - keyword
# The procedure should return a list
# of the urls associated
# with the keyword. If the keyword
# is not in the index, the procedure
# should return an empty list.
index = [['udacity', ['http://udacity.com', 'http://npr.org']],
['computing', ['http://acm.org']]]
def lookup(index,keyword):
for entry in index:
if entry[0]==keyword:
return entry[1]
return []
print lookup(index,'udacity')
#>>> ['http://udacity.com','http://npr.org']

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/bVjECgrnKj4.mp4

Building the Web Index

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/aRteT5uKqfg.mp4

Add Page to Index

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/_5rpzWzFnJM.mp4

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# Define a procedure, add_page_to_index,
# that takes three inputs:
# - index
# - url (String)
# - content (String)
# It should update the index to include
# all of the word occurences found in the
# page content by adding the url to the
# word's associated url list.
index = []
def add_to_index(index,keyword,url):
for entry in index:
if entry[0] == keyword:
entry[1].append(url)
return
index.append([keyword,[url]])
def add_page_to_index(index,url,content):
contents = content.split()
for word in contents:
add_to_index(index,word,url)
add_page_to_index(index,'fake.text',"This is a test")
print index
#>>> [['This', ['fake.text']], ['is', ['fake.text']], ['a', ['fake.text']],
#>>> ['test',['fake.text']]]

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/i3V-Aw4y-hg.mp4

Finishing the Web Crawler

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/dQjsf-4cWo0.mp4

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/cPKnNmFTS80.mp4

Startup

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/1XElSoLZfKQ.mp4

The Internet

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/ePw5eGJXuw8.mp4

Networks

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/dy4KsLNw1lU.mp4
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/_8Xgtd4j7j8.mp4

Smoke Signals

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/8B6WSjA7DG8.mp4

Latency

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/6_1akTCAnt4.mp4
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/5NoF37dKsAI.mp4

Bandwidth

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/P83jTqcQ10A.mp4

Bits

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/6HCFOyZI9tA.mp4
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/4OxrAgA30T8.mp4

Buckets of Bits

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/IS7TO_lLXFE.mp4

What Is Your Bandwidth?

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/jG252FaodkA.mp4

Traceroute

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/kU30juVBCBg.mp4

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
C:\Users\SSQ>tracert www.udacity.com
通过最多 30 个跃点跟踪
到 apollo-mesos-elb-berlioz2-prod-885022263.us-west-2.elb.amazonaws.com [52.32.6
8.151] 的路由:
1 2 ms 60 ms 1 ms 192.168.1.1
2 14 ms 2 ms 4 ms 222.199.225.1
3 1 ms 1 ms 1 ms 202.4.128.193
4 342 ms 415 ms 227 ms 202.4.128.213
5 631 ms 413 ms 727 ms 172.30.33.5
6 683 ms 390 ms 638 ms 10.255.100.161
7 635 ms 532 ms 680 ms 124.205.98.145
8 * * * 请求超时。
9 * * * 请求超时。
10 * 707 ms * 14.197.246.209
11 483 ms 709 ms * 221.4.0.134
12 784 ms 701 ms 700 ms 221.4.0.133
13 736 ms 698 ms 721 ms 120.80.3.37
14 * 719 ms 768 ms 120.81.0.101
15 * * 793 ms 219.158.111.253
16 886 ms 646 ms 825 ms 219.158.13.98
17 886 ms 605 ms 601 ms 219.158.103.94
18 1135 ms 855 ms 1167 ms 219.158.116.234
19 956 ms 910 ms 1132 ms sjp-brdr-04.inet.qwest.net [63.146.27.85]
20 1157 ms 1001 ms * tuk-edge-13.inet.qwest.net [67.14.4.206]
21 1181 ms 962 ms 1157 ms 65-122-235-170.dia.static.qwest.net [65.122.235.
170]
22 * * * 请求超时。
23 * * * 请求超时。
24 * * * 请求超时。
25 * * * 请求超时。
26 * * * 请求超时。
27 1034 ms 995 ms 1037 ms 52.93.14.71
28 1007 ms * 1067 ms 52.93.14.70
29 * * * 请求超时。
30 955 ms 915 ms 926 ms 205.251.232.222
跟踪完成。

Traveling Data

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/bQqvRI8NSFo.mp4
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/8csDnLICd4w.mp4

Making a Network

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/jFElXIkFEhc.mp4

Protocols

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/0U31-O4oEPc.mp4

Conclusion

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/1wKlRFhn4zg.mp4

Lesson 16 Problem Set

Data Structures


https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/6rE8vvdYn2c.mp4

Ben Bitdiddle


Ben Bitdiddle suggests changing the index code by replacing the add_to_index and lookup procedures with the ones shown below the question.

def add_to_index(index, keyword, url):
    index.append([keyword, url])

def lookup(index, keyword):
    result = []
    for entry in index:
        if entry[0] == keyword:
            result.append(entry[1])
    return result

This changes the structure of index, but suppose the only way we use index is by calling add_to_index and lookup. How would this affect the search engine?

**It would produce the wrong results for some lookup queries.

It would produce the same results for all queries, but lookup would sometimes be faster than the original code.

It would produce the same results for all queries, but add_to_index would be faster and lookup would usually be slower than the original code.

It would produce the same results and take the same amount of time for all queries**

Old Code
def add_to_index(index, keyword, url):
for entry in index:
if entry[0] == keyword:
entry[1].append(url)
return

    # not found, add new keyword to index
    index.append([keyword, [url]])

def lookup(index, keyword):
    for entry in index:
        if entry[0] == keyword:
            return entry[1]
    return []

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/h5KA5t8yo3I.mp4

Networking


https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/rl7zOmndGLY.mp4

Better Splitting

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# 1 Gold Star
# The built-in <string>.split() procedure works
# okay, but fails to find all the words on a page
# because it only uses whitespace to split the
# string. To do better, we should also use punctuation
# marks to split the page into words.
# Define a procedure, split_string, that takes two
# inputs: the string to split and a string containing
# all of the characters considered separators. The
# procedure should return a list of strings that break
# the source string up by the characters in the
# splitlist.
def split_string(source,splitlist):
#out = split_string("This is a test-of the,string separation-code!"," ,!-")
#print out
#>>> ['This', 'is', 'a', 'test', 'of', 'the', 'string', 'separation', 'code']
#out = split_string("After the flood ... all the colors came out.", " .")
#print out
#>>> ['After', 'the', 'flood', 'all', 'the', 'colors', 'came', 'out']
#out = split_string("First Name,Last Name,Street Address,City,State,Zip Code",",")
#print out
#>>>['First Name', 'Last Name', 'Street Address', 'City', 'State', 'Zip Code']

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/alpdXaaSfGI.mp4

1
2
3
4
5
6
7
8
9
10
11
12
13
def split_string(source,splitlist):
output=[]
atsplit=True
for char in source:
if char in splitlist:
atsplit =True
else:
if atsplit:
output.append(char)
atsplit=False
else:
output[-1] =output[-1]+char
return output

Improving the Index

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
# The current index includes a url in the list of urls
# for a keyword multiple times if the keyword appears
# on that page more than once.
# It might be better to only include the same url
# once in the url list for a keyword, even if it appears
# many times.
# Modify add_to_index so that a given url is only
# included once in the url list for a keyword,
# no matter how many times that keyword appears.
def add_to_index(index, keyword, url):
for entry in index:
if entry[0] == keyword:
entry[1].append(url)
return
# not found, add new keyword to index
index.append([keyword, [url]])
def get_page(url):
try:
if url == "http://www.udacity.com/cs101x/index.html":
return '''<html> <body> This is a test page for learning to crawl!
<p> It is a good idea to
<a href="http://www.udacity.com/cs101x/crawling.html">
learn to crawl</a> before you try to
<a href="http://www.udacity.com/cs101x/walking.html">walk</a> or
<a href="http://www.udacity.com/cs101x/flying.html">fly</a>.</p></body>
</html>'''
elif url == "http://www.udacity.com/cs101x/crawling.html":
return '''<html> <body> I have not learned to crawl yet, but I am
quite good at <a href="http://www.udacity.com/cs101x/kicking.html">kicking</a>.
</body> </html>'''
elif url == "http://www.udacity.com/cs101x/walking.html":
return '''<html> <body> I cant get enough
<a href="http://www.udacity.com/cs101x/index.html">crawling</a></body></html>'''
elif url == "http://www.udacity.com/cs101x/flying.html":
return '''<html>
<body>The magic words are Squeamish Ossifrage!</body></html>'''
except:
return ""
return ""
def union(a, b):
for e in b:
if e not in a:
a.append(e)
def get_next_target(page):
start_link = page.find('<a href=')
if start_link == -1:
return None, 0
start_quote = page.find('"', start_link)
end_quote = page.find('"', start_quote + 1)
url = page[start_quote + 1:end_quote]
return url, end_quote
def get_all_links(page):
links = []
while True:
url, endpos = get_next_target(page)
if url:
links.append(url)
page = page[endpos:]
else:
break
return links
def crawl_web(seed):
tocrawl = [seed]
crawled = []
index = []
while tocrawl:
page = tocrawl.pop()
if page not in crawled:
content = get_page(page)
add_page_to_index(index, page, content)
union(tocrawl, get_all_links(content))
crawled.append(page)
return index
def add_page_to_index(index, url, content):
words = content.split()
for word in words:
add_to_index(index, word, url)
def lookup(index, keyword):
for entry in index:
if entry[0] == keyword:
return entry[1]
return None
#index = crawl_web("http://www.udacity.com/cs101x/index.html")
#print lookup(index,"is")
#>>> ['http://www.udacity.com/cs101x/index.html']

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/GD98Z_3cANU.mp4

1
2
3
4
5
6
7
8
def add_to_index(index, keyword, url):
for entry in index:
if entry[0] == keyword:
if not url in entry[1]:
entry[1].append(url)
return
# not found, add new keyword to index
index.append([keyword, [url]])

Counting Clicks

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
# 2 Gold Stars
# One way search engines rank pages
# is to count the number of times a
# searcher clicks on a returned link.
# This indicates that the person doing
# the query thought this was a useful
# link for the query, so it should be
# higher in the rankings next time.
# (In Unit 6, we will look at a different
# way of ranking pages that does not depend
# on user clicks.)
# Modify the index such that for each url in a
# list for a keyword, there is also a number
# that counts the number of times a user
# clicks on that link for this keyword.
# The result of lookup(index,keyword) should
# now be a list of url entries, where each url
# entry is a list of a url and a number
# indicating the number of times that url
# was clicked for this query keyword.
# You should define a new procedure to simulate
# user clicks for a given link:
# record_user_click(index,word,url)
# that modifies the entry in the index for
# the input word by increasing the count associated
# with the url by 1.
# You also will have to modify add_to_index
# in order to correctly create the new data
# structure, and to prevent the repetition of
# entries as in homework 4-5.
def record_user_click(index,keyword,url):
def add_to_index(index, keyword, url):
for entry in index:
if entry[0] == keyword:
entry[1].append(url)
return
# not found, add new keyword to index
index.append([keyword, [url]])
def get_page(url):
try:
if url == "http://www.udacity.com/cs101x/index.html":
return '''<html> <body> This is a test page for learning to crawl!
<p> It is a good idea to
<a href="http://www.udacity.com/cs101x/crawling.html">
learn to crawl</a> before you try to
<a href="http://www.udacity.com/cs101x/walking.html">walk</a> or
<a href="http://www.udacity.com/cs101x/flying.html">fly</a>.</p></body></html>'''
elif url == "http://www.udacity.com/cs101x/crawling.html":
return '''<html> <body> I have not learned to crawl yet, but I am
quite good at <a href="http://www.udacity.com/cs101x/kicking.html">kicking</a>.
</body> </html>'''
elif url == "http://www.udacity.com/cs101x/walking.html":
return '''<html> <body> I cant get enough
<a href="http://www.udacity.com/cs101x/index.html">crawling</a>!</body></html>'''
elif url == "http://www.udacity.com/cs101x/flying.html":
return '<html><body>The magic words are Squeamish Ossifrage!</body></html>'
except:
return ""
return ""
def union(a, b):
for e in b:
if e not in a:
a.append(e)
def get_next_target(page):
start_link = page.find('<a href=')
if start_link == -1:
return None, 0
start_quote = page.find('"', start_link)
end_quote = page.find('"', start_quote + 1)
url = page[start_quote + 1:end_quote]
return url, end_quote
def get_all_links(page):
links = []
while True:
url, endpos = get_next_target(page)
if url:
links.append(url)
page = page[endpos:]
else:
break
return links
def crawl_web(seed):
tocrawl = [seed]
crawled = []
index = []
while tocrawl:
page = tocrawl.pop()
if page not in crawled:
content = get_page(page)
add_page_to_index(index, page, content)
union(tocrawl, get_all_links(content))
crawled.append(page)
return index
def add_page_to_index(index, url, content):
words = content.split()
for word in words:
add_to_index(index, word, url)
def lookup(index, keyword):
for entry in index:
if entry[0] == keyword:
return entry[1]
return None
#Here is an example showing a sequence of interactions:
index = crawl_web('http://www.udacity.com/cs101x/index.html')
print lookup(index, 'good')
#>>> [['http://www.udacity.com/cs101x/index.html', 0],
#>>> ['http://www.udacity.com/cs101x/crawling.html', 0]]
record_user_click(index, 'good', 'http://www.udacity.com/cs101x/crawling.html')
print lookup(index, 'good')
#>>> [['http://www.udacity.com/cs101x/index.html', 0],
#>>> ['http://www.udacity.com/cs101x/crawling.html', 1]]

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/XAb3iFZfOl0.mp4

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
def record_user_click(index, keyword, url):
urls = lookup(index, keyword)
if urls:
for entry in urls:
if entry[0] == url:
entry[1] = entry[1]+1
def add_to_index(index, keyword, url):
# format of index: [[keyword, [[url, count], [url, count],..]],...]
for entry in index:
if entry[0] == keyword:
for urls in entry[1]:
if urls[0] == url:
return
entry[1].append([url,0])
return
# not found, add new keyword to index
index.append([keyword, [[url,0]]])

Time Spent at Routers

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/jqNiHcMsS_Y.mp4

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/TWEE2f1w55U.mp4

Problem Set(Optional)

Word Count

Latency

Converting Seconds

Download Calculator

How Programs Run

Introduction

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/XJfrUOoQSOI.mp4
Welcome to Unit 5!

The notes for Unit 5 are here: PDF and web.

The main goal of Unit 5 is to learn about how computer scientists measure cost, which is mostly about understanding how the resources needed to run a program scale with the size of its input. We’ll also learn about implementing and using a hash table, a data structure that will massively improve the performance of our search engine.

It was a privilege to meet with Gabriel Weinberg, the founder of DuckDuckGo, to film this introduction. DuckDuckGo protects the privacy of its users and gets around 3 million searches per day. Gabriel’s blog is full of interesting articles about computing and startups.

Making Things Fast

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/JIeuI6mknUk.mp4

Measuring Speed


https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/5vnXm71KECU.mp4

Stopwatch

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/ms0iENK29jA.mp4

Spin Loop

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/hBw8qWGHrEs.mp4

Predicting Run Time

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/HBwT29hWXrs.mp4

Make Big Index

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/zxfXpB6U_0w.mp4

Index Size Vs. Time

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/yYm5t1wLarM.mp4
Sample timings:

>>> time_execution('lookup(index10000, "udacity")')
(None, 0.000968000000000302)

>>> time_execution('lookup(index10000, "udacity")')
(None, 0.000905999999863066)

>>> time_execution('lookup(index100000, "udacity")')
(None, 0.008590000000002652)

>>> time_execution('lookup(index100000, "udacity")')
(None, 0.008517999999998093)

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/5DHrCwtBuGU.mp4

Lookup Time

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/PsCqA6fJ1hk.mp4
This quiz depends on the code for make_big_index(size) from a few segments before:

def make_big_index(size):
    index = []
    letters = ['a', 'a', 'a', 'a', 'a', 'a', 'a', 'a']
    while len(index) < size:
        word = make_string(letters)
        add_to_index(index, word, 'fake')
        for i in range(len(letters) - 1, 0, -1): 
            if letters[i] < 'z':
                  letters[i] = chr(ord(letters[i]) + 1)
                  break
            else:
                   letters[i] = 'a'
    return index

This quiz depends on the code for make_big_index(size) from a few segments before (as well as the code for lookup and add_to_index):
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/WtfufPxl8Mw.mp4

Worst Case

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/TBylO5VopA4.mp4
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/26jeGBtszyk.mp4

Fast Enough

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/lSakl4WtFiE.mp4
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/rbYT97miQMY.mp4

Making Lookup Faster

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/AArXvYMTCOM.mp4

Hash Table

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/KxGQbWGPeak.mp4
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/fdddZ5zcHyI.mp4

Hash Function

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/xzQy09kBswM.mp4
ord()
ord('a')->97
chr()

Modulus Operator

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/b2J5RyLdNy8.mp4

Modulus Quiz

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/l8cjHI9UbW4.mp4
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/nmV96-OcGi4.mp4

Equivalent Expressions

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/yoUU_QDJv4o.mp4
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/TRuWp6uRBKI.mp4

Bad Hash

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/gGSY4yAusdk.mp4
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/qn99D3acUnA.mp4

Better Hash Functions

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/SKbp6T6C-0Q.mp4

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# Define a function, hash_string,
# that takes as inputs a keyword
# (string) and a number of buckets,
# and returns a number representing
# the bucket for that keyword.
def hash_string(keyword,buckets):
#print hash_string('a',12)
#>>> 1
#print hash_string('b',12)
#>>> 2
#print hash_string('a',13)
#>>> 6
#print hash_string('au',12)
#>>> 10
#print hash_string('udacity',12)
#>>> 11

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/9vGXclxo8Kc.mp4
My claim about the performance being better with the % buckets inside the loop is (often, and possibly always?) incorrect. Some enterprising students have done experiments showing this, and there is more discussion in the forum.

Testing Hash Functions

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/TjWwI-MvEhI.mp4

Keywords and Buckets

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/2cL69wIOpVk.mp4
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/0k-hAMfA5uY.mp4

Implementing Hash Tables

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/sRfcPW1Rj_4.mp4
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/3AN9tyu_w-I.mp4

Empty Hash Table

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/a_NjE-wJQGc.mp4

1
2
3
4
5
6
7
# Creating an Empty Hash Table
# Define a procedure, make_hashtable,
# that takes as input a number, nbuckets,
# and returns an empty hash table with
# nbuckets empty buckets.
def make_hashtable(nbuckets):

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/e3gDr_MWqDA.mp4

The Hard Way

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/cpAkNOOdzLw.mp4
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/GY8OtZj6LZA.mp4

Finding Buckets

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/0ZOo7GAm2qU.mp4

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# Define a procedure, hashtable_get_bucket,
# that takes two inputs - a hashtable, and
# a keyword, and returns the bucket where the
# keyword could occur.
def hashtable_get_bucket(htable,keyword):
def hash_string(keyword,buckets):
out = 0
for s in keyword:
out = (out + ord(s)) % buckets
return out
def make_hashtable(nbuckets):
table = []
for unused in range(0,nbuckets):
table.append([])
return table
#table = [[['Francis', 13], ['Ellis', 11]], [], [['Bill', 17],
#['Zoe', 14]], [['Coach', 4]], [['Louis', 29], ['Rochelle', 4], ['Nick', 2]]]
#print hashtable_get_bucket(table, "Zoe")
#>>> [['Bill', 17], ['Zoe', 14]]
#print hashtable_get_bucket(table, "Brick")
#>>> []
#print hashtable_get_bucket(table, "Lilith")
#>>> [['Louis', 29], ['Rochelle', 4], ['Nick', 2]]

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/e_ZLxgElqks.mp4

Adding Keywords

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/VFulPFO-OS0.mp4

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
# Define a procedure,
#
# hashtable_add(htable,key,value)
#
# that adds the key to the hashtable (in
# the correct bucket), with the correct
# value and returns the new hashtable.
#
# (Note that the video question and answer
# do not return the hashtable, but your code
# should do this to pass the test cases.)
def hashtable_add(htable,key,value):
# your code here
return htable
def hashtable_get_bucket(htable,keyword):
return htable[hash_string(keyword,len(htable))]
def hash_string(keyword,buckets):
out = 0
for s in keyword:
out = (out + ord(s)) % buckets
return out
def make_hashtable(nbuckets):
table = []
for unused in range(0,nbuckets):
table.append([])
return table
#table = make_hashtable(5)
#hashtable_add(table,'Bill', 17)
#hashtable_add(table,'Coach', 4)
#hashtable_add(table,'Ellis', 11)
#hashtable_add(table,'Francis', 13)
#hashtable_add(table,'Louis', 29)
#hashtable_add(table,'Nick', 2)
#hashtable_add(table,'Rochelle', 4)
#hashtable_add(table,'Zoe', 14)
#print table
#>>> [[['Ellis', 11], ['Francis', 13]], [], [['Bill', 17], ['Zoe', 14]],
#>>> [['Coach', 4]], [['Louis', 29], ['Nick', 2], ['Rochelle', 4]]]

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/ge6HmR7EuDI.mp4

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# Define a procedure,
#
# hashtable_add(htable,key,value)
#
# that adds the key to the hashtable (in
# the correct bucket), with the correct
# value and returns the new hashtable.
#
# (Note that the video question and answer
# do not return the hashtable, but your code
# should do this to pass the test cases.)
def hashtable_add(htable,key,value):
# your code here
bucket = hashtable_get_bucket(htable,key)
bucket.append([key,value])
return htable
def hashtable_get_bucket(htable,keyword):
return htable[hash_string(keyword,len(htable))]
def hash_string(keyword,buckets):
out = 0
for s in keyword:
out = (out + ord(s)) % buckets
return out
def make_hashtable(nbuckets):
table = []
for unused in range(0,nbuckets):
table.append([])
return table
table = make_hashtable(5)
hashtable_add(table,'Bill', 17)
hashtable_add(table,'Coach', 4)
hashtable_add(table,'Ellis', 11)
hashtable_add(table,'Francis', 13)
hashtable_add(table,'Louis', 29)
hashtable_add(table,'Nick', 2)
hashtable_add(table,'Rochelle', 4)
hashtable_add(table,'Zoe', 14)
print table
#>>> [[['Ellis', 11], ['Francis', 13]], [], [['Bill', 17], ['Zoe', 14]],
#>>> [['Coach', 4]], [['Louis', 29], ['Nick', 2], ['Rochelle', 4]]]

Lookup

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/vJJ0wakAu7s.mp4

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# Define a procedure,
# hashtable_lookup(htable,key)
# that takes two inputs, a hashtable
# and a key (string),
# and returns the value associated
# with that key.
def hashtable_lookup(htable,key):
def hashtable_add(htable,key,value):
bucket = hashtable_get_bucket(htable,key)
bucket.append([key,value])
def hashtable_get_bucket(htable,keyword):
return htable[hash_string(keyword,len(htable))]
def hash_string(keyword,buckets):
out = 0
for s in keyword:
out = (out + ord(s)) % buckets
return out
def make_hashtable(nbuckets):
table = []
for unused in range(0,nbuckets):
table.append([])
return table
#table = [[['Ellis', 11], ['Francis', 13]], [], [['Bill', 17], ['Zoe', 14]],
#[['Coach', 4]], [['Louis', 29], ['Nick', 2], ['Rochelle', 4]]]
#print hashtable_lookup(table, 'Francis')
#>>> 13
#print hashtable_lookup(table, 'Louis')
#>>> 29
#print hashtable_lookup(table, 'Zoe')
#>>> 14

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/cKkaGzt9pwk.mp4

Update

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# Define a procedure,
# hashtable_update(htable,key,value)
# that updates the value associated with key. If key is already in the
# table, change the value to the new value. Otherwise, add a new entry
# for the key and value.
# Hint: Use hashtable_lookup as a starting point.
# Make sure that you return the new htable
def hashtable_update(htable,key,value):
# Your code here
return htable
def hashtable_lookup(htable,key):
bucket = hashtable_get_bucket(htable,key)
for entry in bucket:
if entry[0] == key:
return entry[1]
return None
def hashtable_add(htable,key,value):
bucket = hashtable_get_bucket(htable,key)
bucket.append([key,value])
def hashtable_get_bucket(htable,keyword):
return htable[hash_string(keyword,len(htable))]
def hash_string(keyword,buckets):
out = 0
for s in keyword:
out = (out + ord(s)) % buckets
return out
def make_hashtable(nbuckets):
table = []
for unused in range(0,nbuckets):
table.append([])
return table
table = [[['Ellis', 11], ['Francis', 13]], [], [['Bill', 17], ['Zoe', 14]],
[['Coach', 4]], [['Louis', 29], ['Nick', 2], ['Rochelle', 4]]]
#hashtable_update(table, 'Bill', 42)
#hashtable_update(table, 'Rochelle', 94)
#hashtable_update(table, 'Zed', 68)
#print table
#>>> [[['Ellis', 11], ['Francis', 13]], [['Zed', 68]], [['Bill', 42],
#>>> ['Zoe', 14]], [['Coach', 4]], [['Louis', 29], ['Nick', 2],
#>>> ['Rochelle', 94]]]

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/UPiqKaXshfw.mp4

Dictionaries

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/Tne9hgBqCUY.mp4

Using Dictionaries

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/5wTxBLzR5aM.mp4
For further information on Hash Tables in Python, please refer to this article here

Population

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/em3CWlSaEKY.mp4

1
2
3
4
5
6
7
8
9
10
11
12
13
# Define a Dictionary, population,
# that provides information
# on the world's largest cities.
# The key is the name of a city
# (a string), and the associated
# value is its population in
# millions.
# Key | Value
# Shanghai | 17.8
# Istanbul | 13.3
# Karachi | 13.0
# Mumbai | 12.5

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/H-eGAkgjg_s.mp4

A Noble Gas

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/qJNsfRfFw-c.mp4

Modifying the Search Engine

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/ncC1XboU_lo.mp4
Here’s the code in a more readable format:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
def get_all_links(page):
links = []
while True:
url, endpos = get_next_target(page)
if url:
links.append(url) page = page[endpos:]
else:
break
return links
def crawl_web(seed):
tocrawl = [seed]
crawled = []
index = []
while tocrawl:
page = tocrawl.pop()
if page not in crawled:
content = get_page(page)
add_page_to_index(index, page, content)
union(tocrawl, get_all_links(content))
crawled.append(page)
return index
def add_page_to_index(index, url, content):
words = content.split()
for word in words:
add_to_index(index, word, url)
def add_to_index(index, keyword, url):
for entry in index:
if entry[0] == keyword: entry[1].append(url)
return
# not found, add new keyword to index
index.append([keyword, [url]])
def lookup(index, keyword):
for entry in index:
if entry[0] == keyword:
return entry[1]
return None

Here’s the code in a more readable way: (thanks to Christina-49) def get_all_links(page): links = [] while True: url, endpos = get_next_target(page) if url: links.append(url) page = page[endpos:] else: break return links

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
def crawl_web(seed):
tocrawl = [seed]
crawled = []
index = []
while tocrawl:
page = tocrawl.pop()
if page not in crawled:
content = get_page(page)
add_page_to_index(index, page, content)
union(tocrawl, get_all_links(content))
crawled.append(page)
return index
def add_page_to_index(index, url, content):
words = content.split()
for word in words:
add_to_index(index, word, url)
def add_to_index(index, keyword, url):
for entry in index:
if entry[0] == keyword:
entry[1].append(url)
return
# not found, add new keyword to index
index.append([keyword, [url]])
def lookup(index, keyword):
for entry in index:
if entry[0] == keyword:
return entry[1]
return None

Here’s the code in a more readable format: (thanks to Christina-49)
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/NBJx7q8XNpE.mp4

Changing Lookup

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/OdToP6LQRoc.mp4

1
2
3
4
5
6
7
8
# Change the lookup procedure
# to now work with dictionaries.
def lookup(index, keyword):
if keyword in index:
return index[keyword]
else:
return None

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/avNhSME0qxQ.mp4

Coming Up Next

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/-qYyswP4FqI.mp4

Problem Set

Growth

Measuring Cost

For which of these procedures does the worst-case running time scale linearly in the number of elements in the input list p? (You may assume that the elements in the list are all small numbers)

Sum_list

def sum_list(p):
    sum = 0
    for e in p:
        sum = sum + e
    return sum
Has_duplicate_element
def has_duplicate_element(p):
   res = []
   for i in range(0, len(p)):
       for j in range(0, len(p)):
           if i != j and p[i] == p[j]:
               return True
   return False
Mystery
def mystery(p):
   i = 0
   while True:
       if i >= len(p):
           break
       if p[i] % 2:
           i = i + 2
       else:
           i = i + 1
   return i

Peter muddles up odd and even in the last question. The statement p[i] % 2 is True whenp[i] is odd and False when it is even, so the worst case is when all the elements in the list are even.
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/tlFdhxXJzaw.mp4

Hash String

Hash String

Suppose we have a hash table implemented as described in Unit 5 using the hash_string function.

def hash_string(keyword, buckets):
    h = 0
    for c in keyword:
        h = (h + ord(c)) % buckets 
    return h

Which of the following are true statements?

Statement 1

The number of string comparisons done to lookup a keyword that is not a key in the hash table may be less than the number needed to lookup a keyword that is a key in the hash table.

Statement 2

We should expect the time to lookup most keywords in the hash table will decrease as we increase the number of buckets.

Statement 3

It is always better to have more buckets in a hash table.

Statement 4

The time to lookup a keyword in the hash table is always less than the time it would take in a linear time list (as used in Unit 4).

Is Offered

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
# Dictionaries of Dictionaries (of Dictionaries)
# The next several questions concern the data structure below for keeping
# track of Udacity's courses (where all of the values are strings):
# { <hexamester>, { <class>: { <property>: <value>, ... },
# ... },
# ... }
#For example,
courses = {
'feb2012': { 'cs101': {'name': 'Building a Search Engine',
'teacher': 'Dave',
'assistant': 'Peter C.'},
'cs373': {'name': 'Programming a Robotic Car',
'teacher': 'Sebastian',
'assistant': 'Andy'}},
'apr2012': { 'cs101': {'name': 'Building a Search Engine',
'teacher': 'Dave',
'assistant': 'Sarah'},
'cs212': {'name': 'The Design of Computer Programs',
'teacher': 'Peter N.',
'assistant': 'Andy',
'prereq': 'cs101'},
'cs253':
{'name': 'Web Application Engineering - Building a Blog',
'teacher': 'Steve',
'prereq': 'cs101'},
'cs262':
{'name': 'Programming Languages - Building a Web Browser',
'teacher': 'Wes',
'assistant': 'Peter C.',
'prereq': 'cs101'},
'cs373': {'name': 'Programming a Robotic Car',
'teacher': 'Sebastian'},
'cs387': {'name': 'Applied Cryptography',
'teacher': 'Dave'}},
'jan2044': { 'cs001': {'name': 'Building a Quantum Holodeck',
'teacher': 'Dorina'},
'cs003': {'name': 'Programming a Robotic Robotics Teacher',
'teacher': 'Jasper'},
}
}
# If you want to loop through the keys in the dictionary,
# you can use the construct below.
# for <key> in <dictionary>:
# <block>
# For example, this procedure returns a list of all the courses offered
# in the given hexamester:
def courses_offered(courses, hexamester):
res = []
for c in courses[hexamester]:
res.append(c)
return res
# You do not need to use this code if you do not want to and may find another,
# simpler method to answer this question, although later ones may require this.
# Define a procedure, is_offered(courses, course, hexamester), that returns
# True if the input course is offered in the input hexamester, and returns
# False otherwise. For example,
#print is_offered(courses, 'cs101', 'apr2012')
#>>> True
#print is_offered(courses, 'cs003', 'apr2012')
#>>> False
# (Note: it is okay if your procedure produces an error if the input
# hexamester is not included in courses.
# For example, is_offered(courses, 'cs101', 'dec2011') can produce an error.)
# However, do not leave any uncommented statements in your code which lead
# to an error as your code will be graded as incorrect.
def is_offered(courses, course, hexamester):
#print is_offered(courses, 'cs101', 'apr2012')
#>>> True
#print is_offered(courses, 'cs003', 'apr2012')
#>>> False
#print is_offered(courses, 'cs001', 'jan2044')
#>>> True
#print is_offered(courses, 'cs253', 'feb2012')
#>>> False

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/Qq8Hd290n5c.mp4

When Offered

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
# Dictionaries of Dictionaries (of Dictionaries)
# The next several questions concern the data structure below for keeping
# track of Udacity's courses (where all of the values are strings):
# { <hexamester>, { <class>: { <property>: <value>, ... },
# ... },
# ... }
# For example,
courses = {
'feb2012': { 'cs101': {'name': 'Building a Search Engine',
'teacher': 'Dave',
'assistant': 'Peter C.'},
'cs373': {'name': 'Programming a Robotic Car',
'teacher': 'Sebastian',
'assistant': 'Andy'}},
'apr2012': { 'cs101': {'name': 'Building a Search Engine',
'teacher': 'Dave',
'assistant': 'Sarah'},
'cs212': {'name': 'The Design of Computer Programs',
'teacher': 'Peter N.',
'assistant': 'Andy',
'prereq': 'cs101'},
'cs253': {'name': 'Web Application Engineering - Building a Blog',
'teacher': 'Steve',
'prereq': 'cs101'},
'cs262': {'name': 'Programming Languages - Building a Web Browser',
'teacher': 'Wes',
'assistant': 'Peter C.',
'prereq': 'cs101'},
'cs373': {'name': 'Programming a Robotic Car',
'teacher': 'Sebastian'},
'cs387': {'name': 'Applied Cryptography',
'teacher': 'Dave'}},
'jan2044': { 'cs001': {'name': 'Building a Quantum Holodeck',
'teacher': 'Dorina'},
'cs003': {'name': 'Programming a Robotic Robotics Teacher',
'teacher': 'Jasper'},
}
}
# For the following questions, you will find the
# for <key> in <dictionary>:
# <block>
# construct useful. This loops through the key values in the Dictionary. For
# example, this procedure returns a list of all the courses offered in the given
# hexamester:
def courses_offered(courses, hexamester):
res = []
for c in courses[hexamester]:
res.append(c)
return res
# Define a procedure, when_offered(courses, course), that takes a courses data
# structure and a string representing a class, and returns a list of strings
# representing the hexamesters when the input course is offered.
def when_offered(courses,course):
#print when_offered (courses, 'cs101')
#>>> ['apr2012', 'feb2012']
#print when_offered(courses, 'bio893')
#>>> []

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/hftOGwEW4qY.mp4

Involved

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
# Dictionaries of Dictionaries (of Dictionaries)
# The next several questions concern the data structure below for keeping
# track of Udacity's courses (where all of the values are strings):
# { <hexamester>, { <class>: { <property>: <value>, ... },
# ... },
# ... }
# For example,
courses = {
'feb2012': { 'cs101': {'name': 'Building a Search Engine',
'teacher': 'Dave',
'assistant': 'Peter C.'},
'cs373': {'name': 'Programming a Robotic Car',
'teacher': 'Sebastian',
'assistant': 'Andy'}},
'apr2012': { 'cs101': {'name': 'Building a Search Engine',
'teacher': 'Dave',
'assistant': 'Sarah'},
'cs212': {'name': 'The Design of Computer Programs',
'teacher': 'Peter N.',
'assistant': 'Andy',
'prereq': 'cs101'},
'cs253':
{'name': 'Web Application Engineering - Building a Blog',
'teacher': 'Steve',
'prereq': 'cs101'},
'cs262':
{'name': 'Programming Languages - Building a Web Browser',
'teacher': 'Wes',
'assistant': 'Peter C.',
'prereq': 'cs101'},
'cs373': {'name': 'Programming a Robotic Car',
'teacher': 'Sebastian'},
'cs387': {'name': 'Applied Cryptography',
'teacher': 'Dave'}},
'jan2044': { 'cs001': {'name': 'Building a Quantum Holodeck',
'teacher': 'Dorina'},
'cs003': {'name': 'Programming a Robotic Robotics Teacher',
'teacher': 'Jasper'},
}
}
# For the following questions, you will find the
# for <key> in <dictionary>:
# <block>
# construct useful. This loops through the key values in the Dictionary. For
# example, this procedure returns a list of all the courses offered in the given
# hexamester:
def courses_offered(courses, hexamester):
res = []
for c in courses[hexamester]:
res.append(c)
return res
# [Double Gold Star] Define a procedure, involved(courses, person), that takes
# as input a courses structure and a person and returns a Dictionary that
# describes all the courses the person is involved in. A person is involved
# in a course if they are a value for any property for the course. The output
# Dictionary should have hexamesters as its keys, and each value should be a
# list of courses that are offered that hexamester (the courses in the list
# can be in any order).
def involved(courses, person):
# For example:
#print involved(courses, 'Dave')
#>>> {'apr2012': ['cs101', 'cs387'], 'feb2012': ['cs101']}
#print involved(courses, 'Peter C.')
#>>> {'apr2012': ['cs262'], 'feb2012': ['cs101']}
#print involved(courses, 'Dorina')
#>>> {'jan2044': ['cs001']}
#print involved(courses,'Peter')
#>>> {}
#print involved(courses,'Robotic')
#>>> {}
#print involved(courses, '')
#>>> {}

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/ej9rXa13kr4.mp4

Refactoring

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
# 6. In video 28. Update, it was suggested that some of the duplicate code in
# lookup and update could be avoided by a better design. We can do this by
# defining a procedure that finds the entry corresponding to a given key, and
# using that in both lookup and update.
# Here are the original procedures:
def hashtable_update(htable, key, value):
bucket = hashtable_get_bucket(htable, key)
for entry in bucket:
if entry[0] == key:
entry[1] = value
return
bucket.append([key, value])
def hashtable_lookup(htable, key):
bucket = hashtable_get_bucket(htable, key)
for entry in bucket:
if entry[0] == key:
return entry[1]
return None
def make_hashtable(size):
table = []
for unused in range(0, size):
table.append([])
return table
def hash_string(s, size):
h = 0
for c in s:
h = h + ord(c)
return h % size
def hashtable_get_bucket(htable, key):
return htable[hash_string(key, len(htable))]
# Whenever we have duplicate code like the loop that finds the entry in
# hashtable_update and hashtable_lookup, we should think if there is a better way
# to write this that would avoid the duplication. We should be able to rewrite
# these procedures to be shorter by defining a new procedure and rewriting both
# hashtable_update and hashtable_lookup to use that procedure.
# Modify the code for both hashtable_update and hashtable_lookup to have the same
# behavior they have now, but using fewer lines of code in each procedure. You
# should define a new procedure to help with this. Your new version should have
# approximately the same running time as the original version, but neither
# hashtable_update or hashtable_lookup should include any for or while loop, and
# the block of each procedure should be no more than 6 lines long.
# Your procedures should have the same behavior as the originals. For example,
table = make_hashtable(10)
hashtable_update(table, 'Python', 'Monty')
hashtable_update(table, 'CLU', 'Barbara Liskov')
hashtable_update(table, 'JavaScript', 'Brendan Eich')
hashtable_update(table, 'Python', 'Guido van Rossum')
print hashtable_lookup(table, 'Python')
#>>> Guido van Rossum

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/EU9NvdGoAt4.mp4

Memoization

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# [Double Gold Star] Memoization is a way to make code run faster by saving
# previously computed results. Instead of needing to recompute the value of an
# expression, a memoized computation first looks for the value in a cache of
# pre-computed values.
# Define a procedure, cached_execution(cache, proc, proc_input), that takes in
# three inputs: a cache, which is a Dictionary that maps inputs to proc to
# their previously computed values, a procedure, proc, which can be called by
# just writing proc(proc_input), and proc_input which is the input to proc.
# Your procedure should return the value of the proc with input proc_input,
# but should only evaluate it if it has not been previously called.
def cached_execution(cache, proc, proc_input):
# Your code here
# Here is an example showing the desired behavior of cached_execution:
def factorial(n):
print "Running factorial"
result = 1
for i in range(2, n + 1):
result = result * i
return result
cache = {} # start cache as an empty dictionary
### first execution (should print out Running factorial and the result)
print cached_execution(cache, factorial, 50)
print "Second time:"
### second execution (should only print out the result)
print cached_execution(cache, factorial, 50)
# Here is a more interesting example using cached_execution
# (do not worry if you do not understand this, though,
# it will be clearer after Unit 6):
def cached_fibo(n):
if n == 1 or n == 0:
return n
else:
return (cached_execution(cache, cached_fibo, n - 1 )
+ cached_execution(cache, cached_fibo, n - 2 ))
cache = {} # new cache for this procedure
# do not try this at home...at least without a cache!
print cached_execution(cache, cached_fibo,100)

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/_aPiXzmiems.mp4

Problem Set(Optional)

Shift a Letter

Shift n Letter

Rotate

Q&A

Hash Tables

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/eiktSrhdrxs.mp4

Rehashing

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/UMsVMW2S53w.mp4

Importing Libraries

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/G3ovp33txfc.mp4

Programming Literacy

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/0oXF2nOTX6I.mp4

How to Have Infinite Power

Infinite Power

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/emhiUKHuBXY.mp4
The notes for Unit 6 are here: PDF and web.

This unit introduces what I think is the most fascinating and powerful idea in all of computing - recursive definitions. Understanding them requires some mind-bending gymnastics, but once you do, you will find elegant and powerful new ways to think about nearly all problems you encounter.

The course moves through this pretty quickly, but fortunately many students have contributed great additional resources that explain things very well and with more detail than I do in the course, and give you more practice with recursive programs.

Here are a few that I think are especially good:

Yet Another Attempt to Explain Recursion by Goldsong
Understanding Recursion: The Stack Model by Charles Lin
StG’s Recursion Collection by Sam the Great

Long Words

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/-PhZlJuDf_o.mp4
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/XzJO5xc3QIk.mp4

Counter

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/Ap3okJ5jIUE.mp4

Counter Quiz

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/6s-aT1rO3JQ.mp4
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/YnPLnU9D3mQ.mp4

Expanding Our Grammar

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/qYsl757ShjA.mp4
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/nU2DBYNw1jM.mp4

Recursive Definitions

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/LinhpqM4cCg.mp4

Ancestors

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/_AQRlt9UA4o.mp4
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/Ip3vojOsIkI.mp4

Recursive Procedures

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/Wa6y0I_uojk.mp4

Recursive Factorial

Around ~0:06, Dave says that factorial takes a positive whole number as its input, but factorial can also take 0 as an input as well. Instead, then, the input to factorial can be any positive integer or 0.

(Side note: whole numbers are defined differently in different contexts, but they are often defined as all of the non-negative integers. This means the whole numbers are 0, 1, 2, 3, 4…, and if we use this terminology, factorial could take any whole number as its input.)

Note: If you get a The server encountered an error. Please try running again. error, that may mean that your program is not terminating when tested. Make sure your recursion will eventually reach a base case.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# Define a procedure, factorial, that takes a natural number as its input, and
# returns the number of ways to arrange the input number of items.
def factorial(n):
#print factorial(0)
#>>> 1
#print factorial(5)
#>>> 120
#print factorial(10)
#>>> 3628800

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/gWoWZHonPdE.mp4

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# Define a procedure, factorial, that takes a natural number as its input, and
# returns the number of ways to arrange the input number of items.
def factorial(n):
if n==0:
return 1
else:
return n*factorial(n-1)
print factorial(0)
#>>> 1
print factorial(5)
#>>> 120
print factorial(10)
#>>> 3628800

Palindromes

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/zWnI5eACaLM.mp4

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# Define a procedure is_palindrome, that takes as input a string, and returns a
# Boolean indicating if the input string is a palindrome.
# Base Case: '' => True
# Recursive Case: if first and last characters don't match => False
# if they do match, is the middle a palindrome?
def is_palindrome(s):
if len(s)==0:
return True
else:
if s[0]==s[-1]:
s=s[1:-1]
return is_palindrome(s)
else:
return False
print is_palindrome('')
#>>> True
print is_palindrome('abab')
#>>> False
print is_palindrome('abba')
#>>> True

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/LTZXRoLZaJQ.mp4

Recursive Vs Iterative

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/kirruxKHaqk.mp4

Bunnies

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/E6oGm_Z2aQk.mp4

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# Define a procedure, fibonacci, that takes a natural number as its input, and
# returns the value of that fibonacci number.
# Two Base Cases:
# fibonacci(0) => 0
# fibonacci(1) => 1
# Recursive Case:
# n > 1 : fibonacci(n) => fibonacci(n-1) + fibonacci(n-2)
def fibonacci(n):
if n>1:
return fibonacci(n-1) + fibonacci(n-2)
else:
if n==1:
return 1
else:
return 0
print fibonacci(0)
#>>> 0
print fibonacci(1)
#>>> 1
print fibonacci(15)
#>>> 610

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/S6wCTLG8BJg.mp4

Divide and Be Conquered

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/eBu1Z_FBwb4.mp4
Note that this is the standard mathematical definition of the Fibonacci sequence, which is a bit different from the counting rabbits motivation in the original problem. The mathematical sequence starts with 0, which is more elegant mathematically, but wouldn’t make as much sense for rabbits multiplying.

Counting Calls

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/Cai4WuKg4SM.mp4
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/bqF8QuYX8YA.mp4
At 1:58 minutes onwards, the formula should be fibo(36 - (n - 1)) = fibo(36 - n + 1) and not fibo(36 - n - 1).

Faster Fibonacci

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/oJAHMzgSTyA.mp4

1
2
3
4
5
6
7
8
9
10
11
#Define a faster fibonacci procedure that will enable us to computer
#fibonacci(36).
def fibonacci(n):
current = 0 # fibonacci(0) at the bginning
after = 1 # fibonacci(1) at the beginning
for i in range(0, n):
current, after = after, current + after
return current
print fibonacci(36)
#>>> 14930352

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/7k7tMKxH6Dg.mp4

Ranking Web Pages

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/k32gyEM5H3Y.mp4

Popularity

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/EP55W6keH7E.mp4

Good Definitions

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/FX8RlDKEEiU.mp4
A note on the notation: friends(p) is a list of all friends of p.
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/HbLwTw6N-0s.mp4

Circular Definitions

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/Rxp6JuoNqL0.mp4
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/YUwZCZVtLaU.mp4

Relaxation

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/e7-gweWZ0io.mp4
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/kpXVV8aiZFU.mp4

Page Rank

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/IKXvSKaI2Ko.mp4

Altavista

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/T6AeBtfaLco.mp4
AltaVista was finally shut down in July 2013. Here’s an interesting article from the Washington Post: AltaVista is dead. Here’s why it’s so hard to compete with Google. (mostly an interview with Gabriel Weinberg).
https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/0ZlFPQ2qQo0.mp4

Urank

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/H8vrZMAllIY.mp4

Implementing Urank

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/B5lAVjLd76Q.mp4

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
# Modify the crawl_web procedure so that instead of just returning the
# index, it returns an index and a graph. The graph should be a
# Dictionary where the key:value entries are:
# url: [list of pages url links to]
def crawl_web(seed): # returns index, graph of outlinks
tocrawl = [seed]
crawled = []
graph = {} # <url>:[list of pages it links to]
index = {}
while tocrawl:
page = tocrawl.pop()
if page not in crawled:
content = get_page(page)
add_page_to_index(index, page, content)
outlinks = get_all_links(content)
#Insert Code Here
graph[page]=outlinks
union(tocrawl, outlinks)
crawled.append(page)
return index, graph
cache = {
'http://udacity.com/cs101x/urank/index.html': """<html>
<body>
<h1>Dave's Cooking Algorithms</h1>
<p>
Here are my favorite recipes:
<ul>
<li> <a href="http://udacity.com/cs101x/urank/hummus.html">Hummus Recipe</a>
<li> <a href="http://udacity.com/cs101x/urank/arsenic.html">World's Best Hummus</a>
<li> <a href="http://udacity.com/cs101x/urank/kathleen.html">Kathleen's Hummus Recipe</a>
</ul>
For more expert opinions, check out the
<a href="http://udacity.com/cs101x/urank/nickel.html">Nickel Chef</a>
and <a href="http://udacity.com/cs101x/urank/zinc.html">Zinc Chef</a>.
</body>
</html>
""",
'http://udacity.com/cs101x/urank/zinc.html': """<html>
<body>
<h1>The Zinc Chef</h1>
<p>
I learned everything I know from
<a href="http://udacity.com/cs101x/urank/nickel.html">the Nickel Chef</a>.
</p>
<p>
For great hummus, try
<a href="http://udacity.com/cs101x/urank/arsenic.html">this recipe</a>.
</body>
</html>
""",
'http://udacity.com/cs101x/urank/nickel.html': """<html>
<body>
<h1>The Nickel Chef</h1>
<p>
This is the
<a href="http://udacity.com/cs101x/urank/kathleen.html">
best Hummus recipe!
</a>
</body>
</html>
""",
'http://udacity.com/cs101x/urank/kathleen.html': """<html>
<body>
<h1>
Kathleen's Hummus Recipe
</h1>
<p>
<ol>
<li> Open a can of garbanzo beans.
<li> Crush them in a blender.
<li> Add 3 tablespoons of tahini sauce.
<li> Squeeze in one lemon.
<li> Add salt, pepper, and buttercream frosting to taste.
</ol>
</body>
</html>
""",
'http://udacity.com/cs101x/urank/arsenic.html': """<html>
<body>
<h1>
The Arsenic Chef's World Famous Hummus Recipe
</h1>
<p>
<ol>
<li> Kidnap the <a href="http://udacity.com/cs101x/urank/nickel.html">Nickel Chef</a>.
<li> Force her to make hummus for you.
</ol>
</body>
</html>
""",
'http://udacity.com/cs101x/urank/hummus.html': """<html>
<body>
<h1>
Hummus Recipe
</h1>
<p>
<ol>
<li> Go to the store and buy a container of hummus.
<li> Open it.
</ol>
</body>
</html>
""",
}
def get_page(url):
if url in cache:
return cache[url]
else:
return None
def get_next_target(page):
start_link = page.find('<a href=')
if start_link == -1:
return None, 0
start_quote = page.find('"', start_link)
end_quote = page.find('"', start_quote + 1)
url = page[start_quote + 1:end_quote]
return url, end_quote
def get_all_links(page):
links = []
while True:
url, endpos = get_next_target(page)
if url:
links.append(url)
page = page[endpos:]
else:
break
return links
def union(a, b):
for e in b:
if e not in a:
a.append(e)
def add_page_to_index(index, url, content):
words = content.split()
for word in words:
add_to_index(index, word, url)
def add_to_index(index, keyword, url):
if keyword in index:
index[keyword].append(url)
else:
index[keyword] = [url]
def lookup(index, keyword):
if keyword in index:
return index[keyword]
else:
return None
index , graph = crawl_web('http://udacity.com/cs101x/urank/index.html')
if 'http://udacity.com/cs101x/urank/index.html' in graph:
print graph['http://udacity.com/cs101x/urank/index.html']
#>>> ['http://udacity.com/cs101x/urank/hummus.html',
#'http://udacity.com/cs101x/urank/arsenic.html',
#'http://udacity.com/cs101x/urank/kathleen.html',
#'http://udacity.com/cs101x/urank/nickel.html',
#'http://udacity.com/cs101x/urank/zinc.html']

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/sR8EJLpWwb4.mp4

Computing Page Rank

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/_ctzQdS3EfA.mp4

Formal Calculations

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/YZ3kRWKL0DI.mp4

Computer Ranks

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/sPaVbELrmh0.mp4

Finishing Urank

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/GovJzUltdL8.mp4

rank(page, time) is defined as: ∑
​p∈inlinks
​​
​outlinks

​d⋅rank(t−1,p)
​​ or:

rank(page, 0) = 1/npages

rank(page, t) = (1-d)/npages

     + sum (d * rank(p, t - 1) / number of outlinks from p) 
over all pages p that link to this page 

Thanks to Henry for suggesting to add this.
The URLs have changed around a bit! Here’s a new index page you can start with to test out your search engine: https://www.udacity.com/cs101x/urank/index.html


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
#Finishing the page ranking algorithm.
def compute_ranks(graph):
d = 0.8 # damping factor
numloops = 10
ranks = {}
npages = len(graph)
for page in graph:
ranks[page] = 1.0 / npages
for i in range(0, numloops):
newranks = {}
for page in graph:
newrank = (1 - d) / npages
#Insert Code Here
for p in graph:
if page in graph[p]:
newrank = newrank + d * ranks[p] / len(graph[p])
newranks[page] = newrank
ranks = newranks
return ranks
cache = {
'http://udacity.com/cs101x/urank/index.html': """<html>
<body>
<h1>Dave's Cooking Algorithms</h1>
<p>
Here are my favorite recipies:
<ul>
<li> <a href="http://udacity.com/cs101x/urank/hummus.html">Hummus Recipe</a>
<li> <a href="http://udacity.com/cs101x/urank/arsenic.html">World's Best Hummus</a>
<li> <a href="http://udacity.com/cs101x/urank/kathleen.html">Kathleen's Hummus Recipe</a>
</ul>
For more expert opinions, check out the
<a href="http://udacity.com/cs101x/urank/nickel.html">Nickel Chef</a>
and <a href="http://udacity.com/cs101x/urank/zinc.html">Zinc Chef</a>.
</body>
</html>
""",
'http://udacity.com/cs101x/urank/zinc.html': """<html>
<body>
<h1>The Zinc Chef</h1>
<p>
I learned everything I know from
<a href="http://udacity.com/cs101x/urank/nickel.html">the Nickel Chef</a>.
</p>
<p>
For great hummus, try
<a href="http://udacity.com/cs101x/urank/arsenic.html">this recipe</a>.
</body>
</html>
""",
'http://udacity.com/cs101x/urank/nickel.html': """<html>
<body>
<h1>The Nickel Chef</h1>
<p>
This is the
<a href="http://udacity.com/cs101x/urank/kathleen.html">
best Hummus recipe!
</a>
</body>
</html>
""",
'http://udacity.com/cs101x/urank/kathleen.html': """<html>
<body>
<h1>
Kathleen's Hummus Recipe
</h1>
<p>
<ol>
<li> Open a can of garbonzo beans.
<li> Crush them in a blender.
<li> Add 3 tablesppons of tahini sauce.
<li> Squeeze in one lemon.
<li> Add salt, pepper, and buttercream frosting to taste.
</ol>
</body>
</html>
""",
'http://udacity.com/cs101x/urank/arsenic.html': """<html>
<body>
<h1>
The Arsenic Chef's World Famous Hummus Recipe
</h1>
<p>
<ol>
<li> Kidnap the <a href="http://udacity.com/cs101x/urank/nickel.html">Nickel Chef</a>.
<li> Force her to make hummus for you.
</ol>
</body>
</html>
""",
'http://udacity.com/cs101x/urank/hummus.html': """<html>
<body>
<h1>
Hummus Recipe
</h1>
<p>
<ol>
<li> Go to the store and buy a container of hummus.
<li> Open it.
</ol>
</body>
</html>
""",
}
def crawl_web(seed): # returns index, graph of inlinks
tocrawl = [seed]
crawled = []
graph = {} # <url>, [list of pages it links to]
index = {}
while tocrawl:
page = tocrawl.pop()
if page not in crawled:
content = get_page(page)
add_page_to_index(index, page, content)
outlinks = get_all_links(content)
graph[page] = outlinks
union(tocrawl, outlinks)
crawled.append(page)
return index, graph
def get_page(url):
if url in cache:
return cache[url]
else:
return None
def get_next_target(page):
start_link = page.find('<a href=')
if start_link == -1:
return None, 0
start_quote = page.find('"', start_link)
end_quote = page.find('"', start_quote + 1)
url = page[start_quote + 1:end_quote]
return url, end_quote
def get_all_links(page):
links = []
while True:
url, endpos = get_next_target(page)
if url:
links.append(url)
page = page[endpos:]
else:
break
return links
def union(a, b):
for e in b:
if e not in a:
a.append(e)
def add_page_to_index(index, url, content):
words = content.split()
for word in words:
add_to_index(index, word, url)
def add_to_index(index, keyword, url):
if keyword in index:
index[keyword].append(url)
else:
index[keyword] = [url]
def lookup(index, keyword):
if keyword in index:
return index[keyword]
else:
return None
index, graph = crawl_web('http://udacity.com/cs101x/urank/index.html')
#print graph
ranks = compute_ranks(graph)
print ranks
#>>> {'http://udacity.com/cs101x/urank/kathleen.html': 0.11661866666666663,
#'http://udacity.com/cs101x/urank/zinc.html': 0.038666666666666655,
#'http://udacity.com/cs101x/urank/hummus.html': 0.038666666666666655,
#'http://udacity.com/cs101x/urank/arsenic.html': 0.054133333333333325,
#'http://udacity.com/cs101x/urank/index.html': 0.033333333333333326,
#'http://udacity.com/cs101x/urank/nickel.html': 0.09743999999999997}

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/k861qM5OqvU.mp4

Search Engine

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/7IlDnp39b0U.mp4

Problem Set

Recursive Grammars


https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/Ej9obZ0QECY.mp4

Rabbits Multiplying

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# Rabbits Multiplying
# A (slightly) more realistic model of rabbit multiplication than the Fibonacci
# model, would assume that rabbits eventually die. For this question, some
# rabbits die from month 6 onwards.
#
# Thus, we can model the number of rabbits as:
#
# rabbits(1) = 1 # There is one pair of immature rabbits in Month 1
# rabbits(2) = 1 # There is one pair of mature rabbits in Month 2
#
# For months 3-5:
# Same as Fibonacci model, no rabbits dying yet
# rabbits(n) = rabbits(n - 1) + rabbits(n - 2)
#
#
# For months > 5:
# All the rabbits that are over 5 months old die along with a few others
# so that the number that die is equal to the number alive 5 months ago.
# Before dying, the bunnies reproduce.
# rabbits(n) = rabbits(n - 1) + rabbits(n - 2) - rabbits(n - 5)
#
# This produces the rabbit sequence: 1, 1, 2, 3, 5, 7, 11, 16, 24, 35, 52, ...
#
# Define a procedure rabbits that takes as input a number n, and returns a
# number that is the value of the nth number in the rabbit sequence.
# For example, rabbits(10) -> 35. (It is okay if your procedure takes too
# long to run on inputs above 30.)
def rabbits(n):
#print rabbits(10)
#>>> 35
#s = ""
#for i in range(1,12):
# s = s + str(rabbits(i)) + " "
#print s
#>>> 1 1 2 3 5 7 11 16 24 35 52

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/pcGGCOPPtmE.mp4

Spreading Udaciousness

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
# Spreading Udaciousness
# One of our modest goals is to teach everyone in the world to program and
# understand computer science. To estimate how long this will take we have
# developed a (very flawed!) model:
# Everyone answering this question will convince a number, spread, (input to
# the model) of their friends to take the course next offering. This will
# continue, so that all of the newly recruited students, as well as the original
# students, will convince spread of their
# friends to take the following offering of the course.
# recruited friends are unique, so there is no duplication among the newly
# recruited students. Define a procedure, hexes_to_udaciousness(n, spread,
# target), that takes three inputs: the starting number of Udacians, the spread
# rate (how many new friends each Udacian convinces to join each hexamester),
# and the target number, and outputs the number of hexamesters needed to reach
# (or exceed) the target.
# For credit, your procedure must not use: while, for, or import math.
def hexes_to_udaciousness(n, spread, target):
# 0 more needed, since n already exceeds target
#print hexes_to_udaciousness(100000, 2, 36230)
#>>> 0
# after 1 hexamester, there will be 50000 + (50000 * 2) Udacians
#print hexes_to_udaciousness(50000, 2, 150000)
#>>> 1
# need to match or exceed the target
#print hexes_to_udaciousness(50000, 2, 150001)
#>>> 2
# only 12 hexamesters (2 years) to world domination!
#print hexes_to_udaciousness(20000, 2, 7 * 10 ** 9)
#>>> 12
# more friends means faster world domination!
#print hexes_to_udaciousness(15000, 3, 7 * 10 ** 9)
#>>> 10

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/q_atgGWy57Y.mp4

Deep Count

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# Deep Count
# The built-in len operator outputs the number of top-level elements in a List,
# but not the total number of elements. For this question, your goal is to count
# the total number of elements in a list, including all of the inner lists.
# Define a procedure, deep_count, that takes as input a list, and outputs the
# total number of elements in the list, including all elements in lists that it
# contains.
# For this procedure, you will need a way to test if a value is a list. We have
# provided a procedure, is_list(p) that does this:
def is_list(p):
return isinstance(p, list)
# It is not necessary to understand how is_list works. It returns True if the
# input is a List, and returns False otherwise.
def deep_count(p):
#print deep_count([1, 2, 3])
#>>> 3
# The empty list still counts as an element of the outer list
#print deep_count([1, [], 3])
#>>> 3
#print deep_count([1, [1, 2, [3, 4]]])
#>>> 7
#print deep_count([[[[[[[[1, 2, 3]]]]]]]])
#>>> 10

Feeling Lucky

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
#Feeling Lucky
#In Unit 6, we implemented a page ranking algorithm, but didn't finish the final
#step of using it to improve our search results. For this question, you will use
#the page rankings to produce the best output for a given query.
#Define a procedure, lucky_search, that takes as input an index, a ranks
#dictionary (the result of compute_ranks), and a keyword, and returns the one
#URL most likely to be the best site for that keyword. If the keyword does not
#appear in the index, lucky_search should return None.
def lucky_search(index, ranks, keyword):
cache = {
'http://udacity.com/cs101x/urank/index.html': """<html>
<body>
<h1>Dave's Cooking Algorithms</h1>
<p>
Here are my favorite recipies:
<ul>
<li> <a href="http://udacity.com/cs101x/urank/hummus.html">Hummus Recipe</a>
<li> <a href="http://udacity.com/cs101x/urank/arsenic.html">World's Best Hummus</a>
<li> <a href="http://udacity.com/cs101x/urank/kathleen.html">Kathleen's Hummus Recipe</a>
</ul>
For more expert opinions, check out the
<a href="http://udacity.com/cs101x/urank/nickel.html">Nickel Chef</a>
and <a href="http://udacity.com/cs101x/urank/zinc.html">Zinc Chef</a>.
</body>
</html>
""",
'http://udacity.com/cs101x/urank/zinc.html': """<html>
<body>
<h1>The Zinc Chef</h1>
<p>
I learned everything I know from
<a href="http://udacity.com/cs101x/urank/nickel.html">the Nickel Chef</a>.
</p>
<p>
For great hummus, try
<a href="http://udacity.com/cs101x/urank/arsenic.html">this recipe</a>.
</body>
</html>
""",
'http://udacity.com/cs101x/urank/nickel.html': """<html>
<body>
<h1>The Nickel Chef</h1>
<p>
This is the
<a href="http://udacity.com/cs101x/urank/kathleen.html">
best Hummus recipe!
</a>
</body>
</html>
""",
'http://udacity.com/cs101x/urank/kathleen.html': """<html>
<body>
<h1>
Kathleen's Hummus Recipe
</h1>
<p>
<ol>
<li> Open a can of garbonzo beans.
<li> Crush them in a blender.
<li> Add 3 tablesppons of tahini sauce.
<li> Squeeze in one lemon.
<li> Add salt, pepper, and buttercream frosting to taste.
</ol>
</body>
</html>
""",
'http://udacity.com/cs101x/urank/arsenic.html': """<html>
<body>
<h1>
The Arsenic Chef's World Famous Hummus Recipe
</h1>
<p>
<ol>
<li> Kidnap the <a href="http://udacity.com/cs101x/urank/nickel.html">Nickel Chef</a>.
<li> Force her to make hummus for you.
</ol>
</body>
</html>
""",
'http://udacity.com/cs101x/urank/hummus.html': """<html>
<body>
<h1>
Hummus Recipe
</h1>
<p>
<ol>
<li> Go to the store and buy a container of hummus.
<li> Open it.
</ol>
</body>
</html>
""",
}
def get_page(url):
if url in cache:
return cache[url]
return ""
def get_next_target(page):
start_link = page.find('<a href=')
if start_link == -1:
return None, 0
start_quote = page.find('"', start_link)
end_quote = page.find('"', start_quote + 1)
url = page[start_quote + 1:end_quote]
return url, end_quote
def get_all_links(page):
links = []
while True:
url, endpos = get_next_target(page)
if url:
links.append(url)
page = page[endpos:]
else:
break
return links
def union(a, b):
for e in b:
if e not in a:
a.append(e)
def add_page_to_index(index, url, content):
words = content.split()
for word in words:
add_to_index(index, word, url)
def add_to_index(index, keyword, url):
if keyword in index:
index[keyword].append(url)
else:
index[keyword] = [url]
def lookup(index, keyword):
if keyword in index:
return index[keyword]
else:
return None
def crawl_web(seed): # returns index, graph of inlinks
tocrawl = [seed]
crawled = []
graph = {} # <url>, [list of pages it links to]
index = {}
while tocrawl:
page = tocrawl.pop()
if page not in crawled:
content = get_page(page)
add_page_to_index(index, page, content)
outlinks = get_all_links(content)
graph[page] = outlinks
union(tocrawl, outlinks)
crawled.append(page)
return index, graph
def compute_ranks(graph):
d = 0.8 # damping factor
numloops = 10
ranks = {}
npages = len(graph)
for page in graph:
ranks[page] = 1.0 / npages
for i in range(0, numloops):
newranks = {}
for page in graph:
newrank = (1 - d) / npages
for node in graph:
if page in graph[node]:
newrank = newrank + d * (ranks[node] / len(graph[node]))
newranks[page] = newrank
ranks = newranks
return ranks
#Here's an example of how your procedure should work on the test site:
#index, graph = crawl_web('http://udacity.com/cs101x/urank/index.html')
#ranks = compute_ranks(graph)
#print lucky_search(index, ranks, 'Hummus')
#>>> http://udacity.com/cs101x/urank/kathleen.html
#print lucky_search(index, ranks, 'the')
#>>> http://udacity.com/cs101x/urank/nickel.html
#print lucky_search(index, ranks, 'babaganoush')
#>>> None

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/6qVB4lZmzMc.mp4

1
2
3
4
5
6
7
8
9
def lucky_search(index, ranks, keyword):
pages=lookup(index,keyword)
if not pages:
return None
best_page=pages[0]
for candidate in pages:
if ranks[candidate]>ranks[best_page]:
best_page=candidate
return best_page

Problem Set 6 Starred

Family Trees

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
# Single Gold Star
# Family Trees
# In the lecture, we showed a recursive definition for your ancestors. For this
# question, your goal is to define a procedure that finds someone's ancestors,
# given a Dictionary that provides the parent relationships.
# Here's an example of an input Dictionary:
ada_family = { 'Judith Blunt-Lytton': ['Anne Isabella Blunt', 'Wilfrid Scawen Blunt'],
'Ada King-Milbanke': ['Ralph King-Milbanke', 'Fanny Heriot'],
'Ralph King-Milbanke': ['Augusta Ada King', 'William King-Noel'],
'Anne Isabella Blunt': ['Augusta Ada King', 'William King-Noel'],
'Byron King-Noel': ['Augusta Ada King', 'William King-Noel'],
'Augusta Ada King': ['Anne Isabella Milbanke', 'George Gordon Byron'],
'George Gordon Byron': ['Catherine Gordon', 'Captain John Byron'],
'John Byron': ['Vice-Admiral John Byron', 'Sophia Trevannion'] }
# Define a procedure, ancestors(genealogy, person), that takes as its first input
# a Dictionary in the form given above, and as its second input the name of a
# person. It should return a list giving all the known ancestors of the input
# person (this should be the empty list if there are none). The order of the list
# does not matter and duplicates will be ignored.
def ancestors(genealogy, person):
# Here are some examples:
#print ancestors(ada_family, 'Augusta Ada King')
#>>> ['Anne Isabella Milbanke', 'George Gordon Byron',
# 'Catherine Gordon','Captain John Byron']
#print ancestors(ada_family, 'Judith Blunt-Lytton')
#>>> ['Anne Isabella Blunt', 'Wilfrid Scawen Blunt', 'Augusta Ada King',
# 'William King-Noel', 'Anne Isabella Milbanke', 'George Gordon Byron',
# 'Catherine Gordon', 'Captain John Byron']
#print ancestors(ada_family, 'Dave')
#>>> []

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/SQ6508of_ZA.mp4

Khayyam Triangle

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
# Double Gold Star
# Khayyam Triangle
# The French mathematician, Blaise Pascal, who built a mechanical computer in
# the 17th century, studied a pattern of numbers now commonly known in parts of
# the world as Pascal's Triangle (it was also previously studied by many Indian,
# Chinese, and Persian mathematicians, and is known by different names in other
# parts of the world).
# The pattern is shown below:
# 1
# 1 1
# 1 2 1
# 1 3 3 1
# 1 4 6 4 1
# ...
# Each number is the sum of the number above it to the left and the number above
# it to the right (any missing numbers are counted as 0).
# Define a procedure, triangle(n), that takes a number n as its input, and
# returns a list of the first n rows in the triangle. Each element of the
# returned list should be a list of the numbers at the corresponding row in the
# triangle.
def triangle(n):
#For example:
#print triangle(0)
#>>> []
#print triangle(1)
#>>> [[1]]
#print triangle(2)
#>> [[1], [1, 1]]
#print triangle(3)
#>>> [[1], [1, 1], [1, 2, 1]]
#print triangle(6)
#>>> [[1], [1, 1], [1, 2, 1], [1, 3, 3, 1], [1, 4, 6, 4, 1], [1, 5, 10, 10, 5, 1]]

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/i8X3KHanfXE.mp4

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def make_next_row(row):
prev=0
result=[]
for i in row:
result.append(i+prev)
prev=i
result.append(prev)
return result
def triangle(n):
current=[1]
result=[]
for unuse in range(0,n):
result.append(current)
current=make_next_row(current)
return result

Only a Little Lucky

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
# Triple Gold Star
# Only A Little Lucky
# The Feeling Lucky question (from the regular homework) assumed it was enough
# to find the best-ranked page for a given query. For most queries, though, we
# don't just want the best page (according to the page ranking algorithm), we
# want a list of many pages that match the query, ordered from the most likely
# to be useful to the least likely.
# Your goal for this question is to define a procedure, ordered_search(index,
# ranks, keyword), that takes the same inputs as lucky_search from Question 5,
# but returns an ordered list of all the URLs that match the query.
# To order the pages, use the quicksort algorithm, invented by Sir Tony Hoare in
# 1959. Quicksort provides a way to sort any list of data, using an expected
# number of comparisons that scales as n log n where n is the number of elements
# in the list.
# The idea of quicksort is quite simple:
# If the list has zero or one elements, it is already sorted.
# Otherwise, pick a pivot element, and split the list into two partitions: one
# contains all the elements equal to or lower than the value of the pivot
# element, and the other contains all the elements that are greater than the
# pivot element. Recursively sort each of the sub-lists, and then return the
# result of concatenating the sorted left sub-list, the pivot element, and the
# sorted right sub-list.
# For simplicity, use the first element in the list as your pivot element (this
# is not usually a good choice, since it means if the input list is already
# nearly sorted, the actual work will be much worse than expected).
def ordered_search(index, ranks, keyword):
cache = {
'http://udacity.com/cs101x/urank/index.html': """<html>
<body>
<h1>Dave's Cooking Algorithms</h1>
<p>
Here are my favorite recipies:
<ul>
<li> <a href="http://udacity.com/cs101x/urank/hummus.html">Hummus Recipe</a>
<li> <a href="http://udacity.com/cs101x/urank/arsenic.html">World's Best Hummus</a>
<li> <a href="http://udacity.com/cs101x/urank/kathleen.html">Kathleen's Hummus Recipe</a>
</ul>
For more expert opinions, check out the
<a href="http://udacity.com/cs101x/urank/nickel.html">Nickel Chef</a>
and <a href="http://udacity.com/cs101x/urank/zinc.html">Zinc Chef</a>.
</body>
</html>
""",
'http://udacity.com/cs101x/urank/zinc.html': """<html>
<body>
<h1>The Zinc Chef</h1>
<p>
I learned everything I know from
<a href="http://udacity.com/cs101x/urank/nickel.html">the Nickel Chef</a>.
</p>
<p>
For great hummus, try
<a href="http://udacity.com/cs101x/urank/arsenic.html">this recipe</a>.
</body>
</html>
""",
'http://udacity.com/cs101x/urank/nickel.html': """<html>
<body>
<h1>The Nickel Chef</h1>
<p>
This is the
<a href="http://udacity.com/cs101x/urank/kathleen.html">
best Hummus recipe!
</a>
</body>
</html>
""",
'http://udacity.com/cs101x/urank/kathleen.html': """<html>
<body>
<h1>
Kathleen's Hummus Recipe
</h1>
<p>
<ol>
<li> Open a can of garbonzo beans.
<li> Crush them in a blender.
<li> Add 3 tablesppons of tahini sauce.
<li> Squeeze in one lemon.
<li> Add salt, pepper, and buttercream frosting to taste.
</ol>
</body>
</html>
""",
'http://udacity.com/cs101x/urank/arsenic.html': """<html>
<body>
<h1>
The Arsenic Chef's World Famous Hummus Recipe
</h1>
<p>
<ol>
<li> Kidnap the <a href="http://udacity.com/cs101x/urank/nickel.html">Nickel Chef</a>.
<li> Force her to make hummus for you.
</ol>
</body>
</html>
""",
'http://udacity.com/cs101x/urank/hummus.html': """<html>
<body>
<h1>
Hummus Recipe
</h1>
<p>
<ol>
<li> Go to the store and buy a container of hummus.
<li> Open it.
</ol>
</body>
</html>
""",
}
def get_page(url):
if url in cache:
return cache[url]
return ""
def get_next_target(page):
start_link = page.find('<a href=')
if start_link == -1:
return None, 0
start_quote = page.find('"', start_link)
end_quote = page.find('"', start_quote + 1)
url = page[start_quote + 1:end_quote]
return url, end_quote
def get_all_links(page):
links = []
while True:
url, endpos = get_next_target(page)
if url:
links.append(url)
page = page[endpos:]
else:
break
return links
def union(a, b):
for e in b:
if e not in a:
a.append(e)
def add_page_to_index(index, url, content):
words = content.split()
for word in words:
add_to_index(index, word, url)
def add_to_index(index, keyword, url):
if keyword in index:
index[keyword].append(url)
else:
index[keyword] = [url]
def lookup(index, keyword):
if keyword in index:
return index[keyword]
else:
return None
def crawl_web(seed): # returns index, graph of inlinks
tocrawl = [seed]
crawled = []
graph = {} # <url>, [list of pages it links to]
index = {}
while tocrawl:
page = tocrawl.pop()
if page not in crawled:
content = get_page(page)
add_page_to_index(index, page, content)
outlinks = get_all_links(content)
graph[page] = outlinks
union(tocrawl, outlinks)
crawled.append(page)
return index, graph
def compute_ranks(graph):
d = 0.8 # damping factor
numloops = 10
ranks = {}
npages = len(graph)
for page in graph:
ranks[page] = 1.0 / npages
for i in range(0, numloops):
newranks = {}
for page in graph:
newrank = (1 - d) / npages
for node in graph:
if page in graph[node]:
newrank = newrank + d * (ranks[node] / len(graph[node]))
newranks[page] = newrank
ranks = newranks
return ranks
# Here are some example showing what ordered_search should do:
# Observe that the result list is sorted so the highest-ranking site is at the
# beginning of the list.
# Note: the intent of this question is for students to write their own sorting
# code, not to use the built-in sort procedure.
index, graph = crawl_web('http://udacity.com/cs101x/urank/index.html')
ranks = compute_ranks(graph)
#print ordered_search(index, ranks, 'Hummus')
#>>> ['http://udacity.com/cs101x/urank/kathleen.html',
# 'http://udacity.com/cs101x/urank/nickel.html',
# 'http://udacity.com/cs101x/urank/arsenic.html',
# 'http://udacity.com/cs101x/urank/hummus.html',
# 'http://udacity.com/cs101x/urank/index.html']
#print ordered_search(index, ranks, 'the')
#>>> ['http://udacity.com/cs101x/urank/nickel.html',
# 'http://udacity.com/cs101x/urank/arsenic.html',
# 'http://udacity.com/cs101x/urank/hummus.html',
# 'http://udacity.com/cs101x/urank/index.html']
#print ordered_search(index, ranks, 'babaganoush')
#>>> None

The provided solution isn’t complete - it doesn’t actually include the ordered_search code, but only the code for sorting the pages.

Atlas7-115196 provided a more complete solution to this problem! See the forum post: http://forums.udacity.com/questions/100371211/corrected-udacity-solution#cs101

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
def quicksort_urls(ranks,urls):
if not urls or len(urls)<=1:
return urls
else:
pivot=ranks[urls[0]]
before=[]
after=[]
for url in urls[1:]:
if ranks[url]>=pivot:
before.append(url)
else:
after.append(url)
return quicksort_urls(ranks,before)+[urls[0]]+quicksort_urls(ranks,after)
def ordered_search(index, ranks, keyword):
urls=lookup(index, keyword)
return quicksort_urls(ranks,urls)

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/lI9O8wUEDFc.mp4

Q&A

Pythonic

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/2g6qtjwKkA0.mp4
Correction - Peter Norvig teaches class CS212.

Python Versions

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/owH7bqKiR-g.mp4

Using Recursion

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/VWyHjEh0tfA.mp4

Recursion in Other Languages

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/siNYLJ1YaAc.mp4

Pagerank

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/rN-5K_q4JDc.mp4

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/ulkWpQl6izE.mp4

International Characters

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/9QbeX7LOl0g.mp4

Past, Present, and Future of Computer

Past, Present, and Future

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/4TChbk9pnxQ.mp4

Themes

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/-6OLwm10pqs.mp4

Overview

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/-atl1N1mvu0.mp4

Computer Science

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/YBJk5Z5bAEA.mp4

Computer Science

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/DIbtX0GqIA8.mp4

Past of Computing

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/vvIj_PWFoyY.mp4

Computer History Museum

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/ClTnWszPp3Q.mp4

Babbage Engine

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/5nYcND7WjCY.mp4

First Hard Drive

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/oSCCFDZLRgY.mp4

Search Before Computers

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/F2DTZoa-zPo.mp4

Search on the Web

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/mgR9sInLwfc.mp4

Present of Computing

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/9fxDFZGwUiA.mp4

Slac and Big Data

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/4_0sCB_csRI.mp4

Mozilla

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/_IfqKBbEqck.mp4

Open Source

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/TUBAD93kCfA.mp4

Getting Involved

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/56KQGpGOwLM.mp4

Having an Impact

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/lL36LxpsXBI.mp4

Benetech

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/FqQFwXOJTeU.mp4

Future of Computing

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/EsTiQxNDQfo.mp4

Text Analysis

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/679-n8LWaVo.mp4

Energy Aware Computing

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/wLyAANVyJQM.mp4

Computer Security

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/H87Yxc4p-C8.mp4

Theory of Computation

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/W7nD3AMJDAI.mp4

Quantum Computing

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/XafsCK3yk4U.mp4

Stay Udacious

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/oDxqlHY6V1w.mp4

Cumulative Practice Problems

Pick One

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Question 1: Pick One
# Define a procedure, pick_one, that takes three inputs: a Boolean
# and two other values. If the first input is True, it should return
# the second input. If the first input is False, it should return the
# third input.
# For example, pick_one(True, 37, 'hello') should return 37, and
# pick_one(False, 37, 'hello') should return 'hello'.
def pick_one():
print pick_one(True, 37, 'hello')
#>>> 37
print pick_one(False, 37, 'hello')
#>>> hello
print pick_one(True, 'red pill', 'blue pill')
#>>> red pill
print pick_one(False, 'sunny', 'rainy')
#>>> rainy

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/K2s3b4XTaN0.mp4

1
2

Triangular Numbers

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# Question 2. Triangular Numbers
# The triangular numbers are the numbers 1, 3, 6, 10, 15, 21, ...
# They are calculated as follows.
# 1
# 1 + 2 = 3
# 1 + 2 + 3 = 6
# 1 + 2 + 3 + 4 = 10
# 1 + 2 + 3 + 4 + 5 = 15
# Write a procedure, triangular, that takes as its input a positive
# integer n and returns the nth triangular number.
def triangular():
print triangular(1)
#>>>1
print triangular(3)
#>>> 6
print triangular(10)
#>>> 55

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/tQplGM4DtTA.mp4

Linear Time


For the procedures below, check the procedures whose running time scales linearly with the length of the input in the worst case. You may assume the elements in input_list are fairly small numbers.

def proc1(input_list):
    maximum = None
    for element in input_list :
        if not maximum or maximum < element:
            maximum = element
    return maximum

def proc2(input_list):
    sum = 0
    while len(input_list) > 0:
        sum = sum + input_list[0]   # Assume input_list[0] is constant time
        input_list = input_list[1:]  # Assume input_list[1:] is constant time
    return sum

def proc3(input_list):
    for i in range(0, len(input_list)):
        for j in range(0, len(input_list)):
            if input_list[i] == input_list[j] and i != j:
                return False
    return True 

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/nnNQTjmc3DY.mp4

Remove Tags

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# Question 4: Remove Tags
# When we add our words to the index, we don't really want to include
# html tags such as <body>, <head>, <table>, <a href="..."> and so on.
# Write a procedure, remove_tags, that takes as input a string and returns
# a list of words, in order, with the tags removed. Tags are defined to be
# strings surrounded by < >. Words are separated by whitespace or tags.
# You may assume the input does not include any unclosed tags, that is,
# there will be no '<' without a following '>'.
def remove_tags():
print remove_tags('''<h1>Title</h1><p>This is a
<a href="http://www.udacity.com">link</a>.<p>''')
#>>> ['Title','This','is','a','link','.']
print remove_tags('''<table cellpadding='3'>
<tr><td>Hello</td><td>World!</td></tr>
</table>''')
#>>> ['Hello','World!']
print remove_tags("<hello><goodbye>")
#>>> []
print remove_tags("This is plain text.")
#>>> ['This', 'is', 'plain', 'text.']

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/5LrTUeawyfI.mp4

1
2
3
4
5
6
7
def remove_tags(content):
start_pos = content.find('<')
while start_pos !=-1:
end_pos = content.find('>',start_pos)
content=content[:start_pos]+' '+content[end_pos+1:]
start_pos = content.find('<')
return content.split()

Date Converter

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# Question 5: Date Converter
# Write a procedure date_converter which takes two inputs. The first is
# a dictionary and the second a string. The string is a valid date in
# the format month/day/year. The procedure should return
# the date written in the form <day> <name of month> <year>.
# For example , if the
# dictionary is in English,
english = {1:"January", 2:"February", 3:"March", 4:"April", 5:"May",
6:"June", 7:"July", 8:"August", 9:"September",10:"October",
11:"November", 12:"December"}
# then "5/11/2012" should be converted to "11 May 2012".
# If the dictionary is in Swedish
swedish = {1:"januari", 2:"februari", 3:"mars", 4:"april", 5:"maj",
6:"juni", 7:"juli", 8:"augusti", 9:"september",10:"oktober",
11:"november", 12:"december"}
# then "5/11/2012" should be converted to "11 maj 2012".
# Hint: int('12') converts the string '12' to the integer 12.
def date_converter():
print date_converter(english, '5/11/2012')
#>>> 11 May 2012
print date_converter(english, '5/11/12')
#>>> 11 May 12
print date_converter(swedish, '5/11/2012')
#>>> 11 maj 2012
print date_converter(swedish, '12/5/1791')
#>>> 5 december 1791

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/YgW5c4mH19g.mp4

1
2
3
4
5
6
7
def date_converter(dic,string_date):
start_pos=string_date.find('/')
end_pos=string_date.find('/',start_pos+1)
month=int(string_date[:start_pos])
day = string_date[start_pos+1:end_pos]
year=string_date[end_pos+1:]
return day+' '+dic[month]+' '+year

or

1
2
3
def date_converter(dic,string_date):
month,day,year=string_date.split('/')
return day+' '+dic[int(month)]+' '+year

Termination


For each of the procedures defined below, check the box if the procedure always terminates for all inputs that are natural numbers (1,2,3…).

def proc1(n):
   while True:
      n = n - 1
      if n == 0:
         break
   return 3

def proc2(n):
   if n == 0:
      return n
   return 1 + proc2(n - 2)

def proc3(n):
   if n <= 3:
      return 1
   return proc3(n - 1) + proc3(n - 2) + proc3(n - 3)

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/RBc-2ZVuH9E.mp4

Find and Replace

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# Question 7: Find and Replace
# For this question you need to define two procedures:
# make_converter(match, replacement)
# Takes as input two strings and returns a converter. It doesn't have
# to make a specific type of thing. It can
# return anything you would find useful in apply_converter.
# apply_converter(converter, string)
# Takes as input a converter (produced by make_converter), and
# a string, and returns the result of applying the converter to the
# input string. This replaces all occurrences of the match used to
# build the converter, with the replacement. It keeps doing
# replacements until there are no more opportunities for replacements.
def make_converter(match, replacement):
def apply_converter(converter, string):
# For example,
c1 = make_converter('aa', 'a')
print apply_converter(c1, 'aaaa')
#>>> a
c = make_converter('aba', 'b')
print apply_converter(c, 'aaaaaabaaaaa')
#>>> ab
# Note that this process is not guaranteed to terminate for all inputs
# (for example, apply_converter(make_converter('a', 'aa'), 'a') would
# run forever).

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/-xUQEeXjC8g.mp4

1
2
3
4
5
6
7
8
9
10
11
def make_converter(match, replacement):
return [match, replacement]
def apply_converter(converter, string):
previous=None
while previous!=string:
previous=string
position=string.find(converter[0])
if position!=-1:
string=string[:position] + converter[1] + string[position+len(converter[0]):]
return string

Longest Repetition

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Question 8: Longest Repetition
# Define a procedure, longest_repetition, that takes as input a
# list, and returns the element in the list that has the most
# consecutive repetitions. If there are multiple elements that
# have the same number of longest repetitions, the result should
# be the one that appears first. If the input list is empty,
# it should return None.
def longest_repetition():
#For example,
print longest_repetition([1, 2, 2, 3, 3, 3, 2, 2, 1])
# 3
print longest_repetition(['a', 'b', 'b', 'b', 'c', 'd', 'd', 'd'])
# b
print longest_repetition([1,2,3,4,5])
# 1
print longest_repetition([])
# None

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/LWSvPPfH9Cw.mp4

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def longest_repetition(input_list):
best=None
length=0
current=None
current_length=0
for element in input_list:
if current != element:
current_length=1
current=element
else:
current_length+=1
if current_length>length:
best=current
length=current_length
return best

Deep Reverse

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# Question 9: Deep Reverse
# Define a procedure, deep_reverse, that takes as input a list,
# and returns a new list that is the deep reverse of the input list.
# This means it reverses all the elements in the list, and if any
# of those elements are lists themselves, reverses all the elements
# in the inner list, all the way down.
# Note: The procedure must not change the input list.
# The procedure is_list below is from Homework 6. It returns True if
# p is a list and False if it is not.
def is_list(p):
return isinstance(p, list)
def deep_reverse():
#For example,
p = [1, [2, 3, [4, [5, 6]]]]
print deep_reverse(p)
#>>> [[[[6, 5], 4], 3, 2], 1]
print p
#>>> [1, [2, 3, [4, [5, 6]]]]
q = [1, [2,3], 4, [5,6]]
print deep_reverse(q)
#>>> [ [6,5], 4, [3, 2], 1]
print q
#>>> [1, [2,3], 4, [5,6]]

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/aYLkoPSiiG0.mp4

1
2
3
4
5
6
7
8
def deep_reverse(input_list):
if is_list(input_list):
new_list=[]
for i in range(len(input_list)-1,-1,-1):
new_list.append(deep_reverse(input_list[i]))
return new_list
else:
return input_list

Challenging Practice Problems

Stirling and Bell

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
# One Gold Star
# Question 1-star: Stirling and Bell Numbers
# The number of ways of splitting n items in k non-empty sets is called
# the Stirling number, S(n,k), of the second kind. For example, the group
# of people Dave, Sarah, Peter and Andy could be split into two groups in
# the following ways.
# 1. Dave, Sarah, Peter Andy
# 2. Dave, Sarah, Andy Peter
# 3. Dave, Andy, Peter Sarah
# 4. Sarah, Andy, Peter Dave
# 5. Dave, Sarah Andy, Peter
# 6. Dave, Andy Sarah, Peter
# 7. Dave, Peter Andy, Sarah
# so S(4,2) = 7
# If instead we split the group into one group, we have just one way to
# do it.
# 1. Dave, Sarah, Peter, Andy
# so S(4,1) = 1
# or into four groups, there is just one way to do it as well
# 1. Dave Sarah Peter Andy
# so S(4,4) = 1
# If we try to split into more groups than we have people, there are no
# ways to do it.
# The formula for calculating the Stirling numbers is
# S(n, k) = k*S(n-1, k) + S(n-1, k-1)
# Furthermore, the Bell number B(n) is the number of ways of splitting n
# into any number of parts, that is,
# B(n) is the sum of S(n,k) for k =1,2, ... , n.
# Write two procedures, stirling and bell. The first procedure, stirling
# takes as its inputs two positive integers of which the first is the
# number of items and the second is the number of sets into which those
# items will be split. The second procedure, bell, takes as input a
# positive integer n and returns the Bell number B(n).
def stirling():
def bell():
#print stirling(1,1)
#>>> 1
#print stirling(2,1)
#>>> 1
#print stirling(2,2)
#>>> 1
#print stirling(2,3)
#>>>0
#print stirling(3,1)
#>>> 1
#print stirling(3,2)
#>>> 3
#print stirling(3,3)
#>>> 1
#print stirling(4,1)
#>>> 1
#print stirling(4,2)
#>>> 7
#print stirling(4,3)
#>>> 6
#print stirling(4,4)
#>>> 1
#print stirling(5,1)
#>>> 1
#print stirling(5,2)
#>>> 15
#print stirling(5,3)
#>>> 25
#print stirling(5,4)
#>>> 10
#print stirling(5,5)
#>>> 1
#print stirling(20,15)
#>>> 452329200
#print bell(1)
#>>> 1
#print bell(2)
#>>> 2
#print bell(3)
#>>> 5
#print bell(4)
#>>> 15
#print bell(5)
#>>> 52
#print bell(15)
#>>> 1382958545

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/TJ4M9ZyAZ3I.mp4

1
2
3
4
5
6
7
8
9
10
11
12
def stirling(n,k):
if n<k:
return 0
if n==k or k==1:
return 1
return k*stirling(n-1, k) + stirling(n-1, k-1)
def bell(n):
result=0
for i in range(1,n+1):
result+=stirling(n,i)
return result

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
# Two Gold Stars
# Question 2: Combatting Link Spam
# One of the problems with our page ranking system is pages can
# collude with each other to improve their page ranks. We consider
# A->B a reciprocal link if there is a link path from B to A of length
# equal to or below the collusion level, k. The length of a link path
# is the number of links which are taken to travel from one page to the
# other.
# If k = 0, then a link from A to A is a reciprocal link for node A,
# since no links needs to be taken to get from A to A.
# If k=1, B->A would count as a reciprocal link if there is a link
# A->B, which includes one link and so is of length 1. (it requires
# two parties, A and B, to collude to increase each others page rank).
# If k=2, B->A would count as a reciprocal link for node A if there is
# a path A->C->B, for some page C, (link path of length 2),
# or a direct link A-> B (link path of length 1).
# Modify the compute_ranks code to
# - take an extra input k, which is a non-negative integer, and
# - exclude reciprocal links of length up to and including k from
# helping the page rank.
def compute_ranks(graph):
d = 0.8 # damping factor
numloops = 10
ranks = {}
npages = len(graph)
for page in graph:
ranks[page] = 1.0 / npages
for i in range(0, numloops):
newranks = {}
for page in graph:
newrank = (1 - d) / npages
for node in graph:
if page in graph[node]:
newrank = newrank + d * (ranks[node]/len(graph[node]))
newranks[page] = newrank
ranks = newranks
return ranks
# For example
g = {'a': ['a', 'b', 'c'], 'b':['a'], 'c':['d'], 'd':['a']}
#print compute_ranks(g, 0) # the a->a link is reciprocal
#>>> {'a': 0.26676872354238684, 'c': 0.1216391112164609,
# 'b': 0.1216391112164609, 'd': 0.1476647842238683}
#print compute_ranks(g, 1) # a->a, a->b, b->a links are reciprocal
#>>> {'a': 0.14761759762962962, 'c': 0.08936469270123457,
# 'b': 0.04999999999999999, 'd': 0.12202199703703702}
#print compute_ranks(g, 2)
# a->a, a->b, b->a, a->c, c->d, d->a links are reciprocal
# (so all pages end up with the same rank)
#>>> {'a': 0.04999999999999999, 'c': 0.04999999999999999,
# 'b': 0.04999999999999999, 'd': 0.04999999999999999}

There was a typo in the last test example.

# a->a, a->b, b->a, a->c, c->d, c->a links are reciprocal

should read

# a->a, a->b, b->a, a->c, c->d, d->a links are reciprocal

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/hP-UDkuUL0o.mp4

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
def is_reciprocal_link(graph,source,destination,k):
if k==0:
return source==destination
if source in graph[destination]:
return True
for node in graph[destination]:
if is_reciprocal_link(graph,source,node,k-1):
return True
return False
def compute_ranks(graph,k):
d = 0.8 # damping factor
numloops = 10
ranks = {}
npages = len(graph)
for page in graph:
ranks[page] = 1.0 / npages
for i in range(0, numloops):
newranks = {}
for page in graph:
newrank = (1 - d) / npages
for node in graph:
if page in graph[node]: # node links to page
if not is_reciprocal_link(graph,node,page,k):
newrank = newrank + d * (ranks[node]/len(graph[node]))
newranks[page] = newrank
ranks = newranks
return ranks

Elementary Cellular Automaton

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/M_pkidxeGMY.mp4

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
# THREE GOLD STARS
# Question 3-star: Elementary Cellular Automaton
# Please see the video for additional explanation.
# A one-dimensional cellular automata takes in a string, which in our
# case, consists of the characters '.' and 'x', and changes it according
# to some predetermined rules. The rules consider three characters, which
# are a character at position k and its two neighbours, and determine
# what the character at the corresponding position k will be in the new
# string.
# For example, if the character at position k in the string is '.' and
# its neighbours are '.' and 'x', then the pattern is '..x'. We look up
# '..x' in the table below. In the table, '..x' corresponds to 'x' which
# means that in the new string, 'x' will be at position k.
# Rules:
# pattern in position k in contribution to
# Value current string new string pattern number
# is 0 if replaced by '.'
# and value if replaced
# by 'x'
# 1 '...' '.' 1 * 0
# 2 '..x' 'x' 2 * 1
# 4 '.x.' 'x' 4 * 1
# 8 '.xx' 'x' 8 * 1
# 16 'x..' '.' 16 * 0
# 32 'x.x' '.' 32 * 0
# 64 'xx.' '.' 64 * 0
# 128 'xxx' 'x' 128 * 1
# ----------
# 142
# To calculate the patterns which will have the central character x, work
# out the values required to sum to the pattern number. For example,
# 32 = 32 so only pattern 32 which is x.x changes the central position to
# an x. All the others have a . in the next line.
# 23 = 16 + 4 + 2 + 1 which means that 'x..', '.x.', '..x' and '...' all
# lead to an 'x' in the next line and the rest have a '.'
# For pattern 142, and starting string
# ...........x...........
# the new strings created will be
# ..........xx........... (generations = 1)
# .........xx............ (generations = 2)
# ........xx............. (generations = 3)
# .......xx.............. (generations = 4)
# ......xx............... (generations = 5)
# .....xx................ (generations = 6)
# ....xx................. (generations = 7)
# ...xx.................. (generations = 8)
# ..xx................... (generations = 9)
# .xx.................... (generations = 10)
# Note that the first position of the string is next to the last position
# in the string.
# Define a procedure, cellular_automaton, that takes three inputs:
# a non-empty string,
# a pattern number which is an integer between 0 and 255 that
# represents a set of rules, and
# a positive integer, n, which is the number of generations.
# The procedure should return a string which is the result of
# applying the rules generated by the pattern to the string n times.
def cellular_automaton():
print cellular_automaton('.x.x.x.x.', 17, 2)
#>>> xxxxxxx..
print cellular_automaton('.x.x.x.x.', 249, 3)
#>>> .x..x.x.x
print cellular_automaton('...x....', 125, 1)
#>>> xx.xxxxx
print cellular_automaton('...x....', 125, 2)
#>>> .xxx....
print cellular_automaton('...x....', 125, 3)
#>>> .x.xxxxx
print cellular_automaton('...x....', 125, 4)
#>>> xxxx...x
print cellular_automaton('...x....', 125, 5)
#>>> ...xxx.x
print cellular_automaton('...x....', 125, 6)
#>>> xx.x.xxx
print cellular_automaton('...x....', 125, 7)
#>>> .xxxxx..
print cellular_automaton('...x....', 125, 8)
#>>> .x...xxx
print cellular_automaton('...x....', 125, 9)
#>>> xxxx.x.x
print cellular_automaton('...x....', 125, 10)
#>>> ...xxxxx

Sorry about this. There is a mistake in the video in generation 3 for pattern 30, which makes all the following lines incorrect as well. The corrected output is:

...x....  (input)
..xxx...  ( generations = 1)
.xx..x..  ( generations = 2)
xx.xxxx.  ( generations = 3) 
x..x....  ( generations = 4)
xxxxx..x  ( generations = 5)
.....xxx  ( generations = 6)

Additional information: Elementary Cellular Automaton at Wolfram’s Mathworld

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
def cellular_automaton(input_string,pattern_number,generation):
patterns={}
pattern_list=['...','..x','.x.','.xx','x..','x.x' ,'xx.','xxx']
n=len(input_string)
# build my patterns dictionary
for i in range(7,-1,-1):
if pattern_number/(2**i)==1:
patterns[pattern_list[i]]='x'
pattern_number=pattern_number-2**i
else:
patterns[pattern_list[i]]='.'
# apply patterns to input_string
# with generation times
for unuse in range(generation):
new_string=''
for i in range(n):
pattern=input_string[i-1]+input_string[i]+input_string[(i+1)%n]
new_string = new_string + patterns[pattern]
input_string=new_string
return new_string

https://s3.cn-north-1.amazonaws.com.cn/u-vid-hd/Jc1vOOWfQaA.mp4

Code Editor

A Place to Try Things Out

1
2
3
4
5
6
7
# Use this to try out anything you like. Use print to display your answer
# when you press the "Test Run" button.
# Use the "Reset" button to reset the screen
a = 1e+9
for i in range(1000000):
a+=1
print a - 1e+9

Project Prep

Project Description

Final Project Description

Congratulations on making it to the final project! Your job is to take simple text strings like “Alex likes Carla, Ryan, and Priya” and turn them into a social network. To do this, you must complete a number of required procedures, as described on the next screen. You must also create a “make-your-own” procedure.

Most of this project will take place inside the browser and most of it will be auto-graded. Feel free to share your final code with your peers in the Discussion Forum for additional feedback.

If you have any questions, ask on the Discussion Forum!

Gamer’s Network

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
# --------------------------- #
# Intro to CS Final Project #
# Gaming Social Network #
# --------------------------- #
#
# Background
# ==========
# You and your friend have decided to start a company that hosts a gaming
# social network site. Your friend will handle the website creation (they know
# what they are doing, having taken our web development class). However, it is
# up to you to create a data structure that manages the game-network information
# and to define several procedures that operate on the network.
#
# In a website, the data is stored in a database. In our case, however, all the
# information comes in a big string of text. Each pair of sentences in the text
# is formatted as follows:
#
# <user> is connected to <user1>, ..., <userM>.<user> likes to play <game1>, ..., <gameN>.
#
# For example:
#
# John is connected to Bryant, Debra, Walter.John likes to play The Movie: The Game, The Legend of Corgi, Dinosaur Diner.
#
# Note that each sentence will be separated from the next by only a period. There will
# not be whitespace or new lines between sentences.
#
# Your friend records the information in that string based on user activity on
# the website and gives it to you to manage. You can think of every pair of
# sentences as defining a user's profile.
#
# Consider the data structures that we have used in class - lists, dictionaries,
# and combinations of the two (e.g. lists of dictionaries). Pick one that
# will allow you to manage the data above and implement the procedures below.
#
# You may assume that <user> is a unique identifier for a user. For example, there
# can be at most one 'John' in the network. Furthermore, connections are not
# symmetric - if 'Bob' is connected to 'Alice', it does not mean that 'Alice' is
# connected to 'Bob'.
#
# Project Description
# ====================
# Your task is to complete the procedures according to the specifications below
# as well as to implement a Make-Your-Own procedure (MYOP). You are encouraged
# to define any additional helper procedures that can assist you in accomplishing
# a task. You are encouraged to test your code by using print statements and the
# Test Run button.
# -----------------------------------------------------------------------------
# Example string input. Use it to test your code.
example_input="John is connected to Bryant, Debra, Walter.\
John likes to play The Movie: The Game, The Legend of Corgi, Dinosaur Diner.\
Bryant is connected to Olive, Ollie, Freda, Mercedes.\
Bryant likes to play City Comptroller: The Fiscal Dilemma, Super Mushroom Man.\
Mercedes is connected to Walter, Robin, Bryant.\
Mercedes likes to play The Legend of Corgi, Pirates in Java Island, Seahorse Adventures.\
Olive is connected to John, Ollie.\
Olive likes to play The Legend of Corgi, Starfleet Commander.\
Debra is connected to Walter, Levi, Jennie, Robin.\
Debra likes to play Seven Schemers, Pirates in Java Island, Dwarves and Swords.\
Walter is connected to John, Levi, Bryant.\
Walter likes to play Seahorse Adventures, Ninja Hamsters, Super Mushroom Man.\
Levi is connected to Ollie, John, Walter.\
Levi likes to play The Legend of Corgi, Seven Schemers, City Comptroller: The Fiscal Dilemma.\
Ollie is connected to Mercedes, Freda, Bryant.\
Ollie likes to play Call of Arms, Dwarves and Swords, The Movie: The Game.\
Jennie is connected to Levi, John, Freda, Robin.\
Jennie likes to play Super Mushroom Man, Dinosaur Diner, Call of Arms.\
Robin is connected to Ollie.\
Robin likes to play Call of Arms, Dwarves and Swords.\
Freda is connected to Olive, John, Debra.\
Freda likes to play Starfleet Commander, Ninja Hamsters, Seahorse Adventures."
# -----------------------------------------------------------------------------
# create_data_structure(string_input):
# Parses a block of text (such as the one above) and stores relevant
# information into a data structure. You are free to choose and design any
# data structure you would like to use to manage the information.
#
# Arguments:
# string_input: block of text containing the network information
#
# You may assume that for all the test cases we will use, you will be given the
# connections and games liked for all users listed on the right-hand side of an
# 'is connected to' statement. For example, we will not use the string
# "A is connected to B.A likes to play X, Y, Z.C is connected to A.C likes to play X."
# as a test case for create_data_structure because the string does not
# list B's connections or liked games.
#
# The procedure should be able to handle an empty string (the string '') as input, in
# which case it should return a network with no users.
#
# Return:
# The newly created network data structure
def create_data_structure(string_input):
return network
# ----------------------------------------------------------------------------- #
# Note that the first argument to all procedures below is 'network' This is the #
# data structure that you created with your create_data_structure procedure, #
# though it may be modified as you add new users or new connections. Each #
# procedure below will then modify or extract information from 'network' #
# ----------------------------------------------------------------------------- #
# -----------------------------------------------------------------------------
# get_connections(network, user):
# Returns a list of all the connections that user has
#
# Arguments:
# network: the gamer network data structure
# user: a string containing the name of the user
#
# Return:
# A list of all connections the user has.
# - If the user has no connections, return an empty list.
# - If the user is not in network, return None.
def get_connections(network, user):
return []
# -----------------------------------------------------------------------------
# get_games_liked(network, user):
# Returns a list of all the games a user likes
#
# Arguments:
# network: the gamer network data structure
# user: a string containing the name of the user
#
# Return:
# A list of all games the user likes.
# - If the user likes no games, return an empty list.
# - If the user is not in network, return None.
def get_games_liked(network,user):
return []
# -----------------------------------------------------------------------------
# add_connection(network, user_A, user_B):
# Adds a connection from user_A to user_B. Make sure to check that both users
# exist in network.
#
# Arguments:
# network: the gamer network data structure
# user_A: a string with the name of the user the connection is from
# user_B: a string with the name of the user the connection is to
#
# Return:
# The updated network with the new connection added.
# - If a connection already exists from user_A to user_B, return network unchanged.
# - If user_A or user_B is not in network, return False.
def add_connection(network, user_A, user_B):
return network
# -----------------------------------------------------------------------------
# add_new_user(network, user, games):
# Creates a new user profile and adds that user to the network, along with
# any game preferences specified in games. Assume that the user has no
# connections to begin with.
#
# Arguments:
# network: the gamer network data structure
# user: a string containing the name of the user to be added to the network
# games: a list of strings containing the user's favorite games, e.g.:
# ['Ninja Hamsters', 'Super Mushroom Man', 'Dinosaur Diner']
#
# Return:
# The updated network with the new user and game preferences added. The new user
# should have no connections.
# - If the user already exists in network, return network *UNCHANGED* (do not change
# the user's game preferences)
def add_new_user(network, user, games):
return network
# -----------------------------------------------------------------------------
# get_secondary_connections(network, user):
# Finds all the secondary connections (i.e. connections of connections) of a
# given user.
#
# Arguments:
# network: the gamer network data structure
# user: a string containing the name of the user
#
# Return:
# A list containing the secondary connections (connections of connections).
# - If the user is not in the network, return None.
# - If a user has no primary connections to begin with, return an empty list.
#
# NOTE:
# It is OK if a user's list of secondary connections includes the user
# himself/herself. It is also OK if the list contains a user's primary
# connection that is a secondary connection as well.
def get_secondary_connections(network, user):
return []
# -----------------------------------------------------------------------------
# count_common_connections(network, user_A, user_B):
# Finds the number of people that user_A and user_B have in common.
#
# Arguments:
# network: the gamer network data structure
# user_A: a string containing the name of user_A
# user_B: a string containing the name of user_B
#
# Return:
# The number of connections in common (as an integer).
# - If user_A or user_B is not in network, return False.
def count_common_connections(network, user_A, user_B):
return 0
# -----------------------------------------------------------------------------
# find_path_to_friend(network, user_A, user_B):
# Finds a connections path from user_A to user_B. It has to be an existing
# path but it DOES NOT have to be the shortest path.
#
# Arguments:
# network: The network you created with create_data_structure.
# user_A: String holding the starting username ("Abe")
# user_B: String holding the ending username ("Zed")
#
# Return:
# A list showing the path from user_A to user_B.
# - If such a path does not exist, return None.
# - If user_A or user_B is not in network, return None.
#
# Sample output:
# >>> print find_path_to_friend(network, "Abe", "Zed")
# >>> ['Abe', 'Gel', 'Sam', 'Zed']
# This implies that Abe is connected with Gel, who is connected with Sam,
# who is connected with Zed.
#
# NOTE:
# You must solve this problem using recursion!
#
# Hints:
# - Be careful how you handle connection loops, for example, A is connected to B.
# B is connected to C. C is connected to B. Make sure your code terminates in
# that case.
# - If you are comfortable with default parameters, you might consider using one
# in this procedure to keep track of nodes already visited in your search. You
# may safely add default parameters since all calls used in the grading script
# will only include the arguments network, user_A, and user_B.
def find_path_to_friend(network, user_A, user_B):
# your RECURSIVE solution here!
return None
# Make-Your-Own-Procedure (MYOP)
# -----------------------------------------------------------------------------
# Your MYOP should either perform some manipulation of your network data
# structure (like add_new_user) or it should perform some valuable analysis of
# your network (like path_to_friend). Don't forget to comment your MYOP. You
# may give this procedure any name you want.
# Replace this with your own procedure! You can also uncomment the lines below
# to see how your code behaves. Have fun!
#net = create_data_structure(example_input)
#print net
#print get_connections(net, "Debra")
#print get_connections(net, "Mercedes")
#print get_games_liked(net, "John")
#print add_connection(net, "John", "Freda")
#print add_new_user(net, "Debra", [])
#print add_new_user(net, "Nick", ["Seven Schemers", "The Movie: The Game"]) # True
#print get_secondary_connections(net, "Mercedes")
#print count_common_connections(net, "Mercedes", "John")
#print find_path_to_friend(net, "John", "Ollie")